Sunteți pe pagina 1din 6

Visual Sentiment Analysis with Noisy Labels by

Reweighting Loss
1st Lin Wang, 2nd Xiangmin Xu, 3nd Kailing Guo, 4st Bolun Cai
School of Electronic and Information Engineering
South China University of Technology
Guangzhou, China, 510641
wanglinphd@gmail.com, {xmxu,guokl}@scut.edu.cn, caibolun@gmail.com

Abstract—Visual sentiment analysis of online user generated


content is important for many social media analysis tasks.
However, label noise is common in sentiment analysis datasets,
which deteriorate classification performance. To address this
issue, we propose a novel visual sentiment analysis method based
on loss reweighting to improve model robustness for label noise.
First, a CNN is pre-trained with softmax loss on noisy labels
datasets. Second, noise matrix is estimated by re-sorting and
re-positioning predicted probability, which is predicted by the
pre-trained CNN. Third, converting noise estimation to the loss
weight, the degeneration of sentiment classifiers performance
caused by noisy labels can be compensated by re-training neural
network with this reweighing loss. We conduct experiments
on public sentiment datasets including Sentibank and Twitter
datasets, and demonstrate that the proposed method outperforms
state-of-the-art results.
Index Terms—visual sentiment analysis, noisy labels, reweigh-
ing loss

I. I NTRODUCTION Fig. 1. The samples with noisy label in Sentibank dataset.

With the development of network bandwidth, cloud storage,


and mobile application, social networks have become an inte- segmentation [9]). Due to the powerful representational capa-
gral part in people’s lives. People tend to express and convey bility, recently, CNNs also have been used for visual sentiment
their feeling by sharing text, images, and videos. Therefore, analysis [10]–[14]. Despite achieving excellent performance,
sentiment analysis becomes an important task in social media CNN-based methods require amounts of labeled samples for
analysis. Compared to text, visual media (images and videos) training. Since it’s difficult and expensive for manual labeling,
deliver richer sentiment information and causes stronger reso- an alternative solution is to collect materials on the internet.
nance. Thereby, visual sentiment analysis is considered as an For example, Sentibank [15], a widely used sentiment
important task on human-machine interaction [1]. Recently, dataset, extracts images from the Flicker1 by selecting sen-
visual sentiment analysis has attracted widespread attention in timental keywords (adjective noun pairs, ANP) as the labels
the field of artificial intelligence. from sentimental lexicon. Collecting sentiment images from
In this paper, we focus on visual sentiment analysis of social-network through keyword searching introduces unreli-
images. In early studies, image sentiment analysis was mainly able sample labels into the dataset. As shown in Fig. 1, the
characterized by low-level features (e.g. color [2], texture images in the upper row are considered positive, and the lower
[3], shape [4]) and middle-level features (e.g. sentribute row are considered negative. As we can see, the images with
[5], principles-of-art features [6]). Because visual sentiment red bounding boxes are not match to their real labels.
analysis usually encounters high-level semantical relationship, Although CNN has been proved to be robust to certain
the performance of traditional sentiment analysis is far from label noise [16], it still suffers network degeneration when
satisfactory. To address this problem, methods with stronger there is too much noise. To address the degeneration of
high-level abstract presentation and spatial relationship under- visual sentiment classifier performance problem cause by label
standing should be used. noise, two kinds of methods are used. The first one is label
Deep convolutional neural networks (CNNs) have achieved noise cleaning. You et al. [14] proposed a progressive CNN
state-of-the-art performance in many computer vision tasks
(e.g. image classification [7], object detection [8], and image 1 http://www.flickr.com/
Fig. 2. The framework for visual sentiment analysis. (1) GoogLeNet is pre-trained with softmax loss to obtain prediction probability. (2) The loss weight is
converted from noise matrix which estimated by probability remapping. (3) GoogLeNet is re-trained applying with this reweighting loss

(PCNN) to remove ambiguous data that affect the classification weight by estimating noise matrix from corrupted data. We
performance. Wu et al. [17] determined the data with noisy propose an algorithm that can carry out this noise estima-
label by semantical matching between the ANPs and labels. tion by prediction probability re-sorting and re-positioning.
The second one is adding extra supervision information. In Furthermore, our proposed method outperforms the state-of-
[18], a deep coupled adjective and noun (DCAN) neural the-art results on sentiment datasets. The contributions of this
network is proposed with adjective noun mutual supervision paper are as follows:
to reduce the influence of noise. Sun et al. [19] proposed a • We propose a novel CNN-based noise estimation method
method to discover affective regions of the training images. on large visual sentiment datasets by probability remap-
With this extra supervision information, N object proposals ping.
was generated from a query image and ranked according to • We propose an improved reweighting loss to resist the
their objective scores, and then the top K regions were selected degeneration of CNN performance caused by noisy la-
as affective region. bels.
Despite their success, the above methods have obvious
shortcomings. Label noise cleaning reduces the number of II. A PPROACHES
training samples and may lose useful information. It is hard In this section, we introduce the proposed visual sentiment
or computational expensive to obtain extra supervision infor- analysis method based on loss reweighting. As shown in Fig. 2,
mation. Noting that visual sentiment analysis of data collected first, sentiment images are fed into GoogLeNet for pre-training
from Internet is intrinsically a noisy label problem. The work with softmax loss. Second, we estimate the noise matrix via
[20] deals with label noise by importance reweighting loss, remapping the predicted probability. Finally, by converting
which has theoretical and experimental guarantee and is popu- the estimated noise matrix to softmax loss weight, we retrain
lar in recent researches on label noise. However, it adopts the GoogLeNet on sentiment dataset with this reweighting loss.
Kullback-Leibler Importance Estimation Procedure (KLIEP)
[21] method to estimate the conditional probability and noise A. Pre-training with softmax loss
ratio, which requires numerical optimization to obtain the Convolutional neural networks have proven to have excel-
solution. This process would have massive calculation when lent performance in classification and regression problems,
the number of training examples becomes too large. In [22], e.g., achieving the state-of-the-art performance in ImageNet
a CNN-based noise estimation method was proposed, which Challenge. Visual sentiment recognition also can be seen as a
only requires pre-training the data with noisy labels to obtain image classification, whose difference is that sentiment recog-
all samples prediction probabilities. Nevertheless, they intro- nition involves higher-level abstract semantic information.
duce an empirical parameters α = 0.97 in noise estimating, High-level abstract semantic understanding requires readers to
which is inconsistent with their theoretical analysis. recognize the image objects and understand their relationship.
In this paper, we propose a novel loss reweighting method In traditional image classification task, images in the same
for visual sentiment analysis with noisy labels, which can class contain similar objects and are not complex. However,
overcome the above problems, i.e., the proposed method only for visual sentiment recognition, each class may contain d-
requires noisy-label samples without any additional constraints ifferent kinds of objects and the objects are intertwined and
and estimates noise ratio without introducing any additional complicated. In addition, image sentiment has a certain degree
experiential parameters. Our basic idea is to derive the loss of subjectivity, which brings difficulty to the establishment of
sentiment datasets. It is for this reason, sentiment datasets are used in label noise problem. The inversed noise matrix can be
mainly collected on the Internet. However, corrupted labels are represented as:
introduced into dataset by this method. Therefore, we proposed ( )
1 − π−1 π−1
a CNN-based method to estimate noise rate by pre-training H= , (4)
π+1 1 − π+1
with softmax loss.
We adopt the GoogLeNet as the architecture of pre-training the inversed noise matrix H and noise matrix T must satisfy
network for sentiment analysis. Considering that the stretch the following relationship:
of the object has a great influence on sentiment, we keep Tji
Hij = ∑c , (5)
the aspect ratio of each image as a constant during resizing j=1 Tij
processing. Since there are only two different output classes:
where c is the number of categories. For binary classification
positive and negative, we redesign the output of the last layer.
problem, c is equal to 2.
Softmax, a widely used classification loss, is used as our pre-
If there is no noise on the dataset, it can be easily proved that
training loss, which can make intra-class be agglomeration
the softmax loss will make images forecasting probabilities
and inter-class be separable. In this paper, different from
are approach 1 or 0. Due to random noise on labels, the pre-
traditional binary softmax loss, (x, y) and (x, ŷ) denote the
training forecasting probability will be deviated from ideal
‘clean’ sample and the observed sample, respectively. y and ŷ
state. In [20], [22], ρ−1 , ρ+1 are approximated as follows:
respectively represents the true label and the observed label. {
Both of them y, ŷ ∈ {−1, +1}. Therefore, the binary softmax ρ+1 = 1 − max p̂(ŷi = +1|xi )
loss can be mathematical represented as: , (6)
ρ−1 = 1 − max p̂(ŷi = −1|xi )
1 ∑
n
according to [20], Eq . (6) is consistent with the upper bounded
l(xi , ŷi ) = − (1 + ŷi )logp̂(ŷi = 1|xi )
2N i=0 of noise.
, (1) We experimentally find that it only works well when the
1 ∑
n
− (1 − ŷi )logp̂(ŷi = −1|xi ) noise ratio is not too large. In this subsection, we propose a
2N i=0 novel noise estimation method, which is closer to true noise
ratio than noise upper bounded and is not introduced any
where xi and ŷi respectively represent the i-th feature vectors empirical parameter.
and labels. The probability of prediction for each sample is: It is assumed that classifier prediction probabilities obey a
 Gaussian distribution, which is easily to satisfy when number
 exp(z+1 )

 p̂(ŷi = +1|xi ) = exp(z ) + exp(z ) of samples is enough. The true noise ratio can be seen as the
+1 −1
, (2) deviation between the mean center of the Gaussian distribution

 exp(z −1 )
 p̂(ŷi = −1|xi ) = and the ideal prediction probability 1. In other words, the
exp(z+1 ) + exp(z−1 ) probability of best example represented the true category
where z+1 and z−1 are network positive output and negative assigned to true class (1 − ρy ) is the probability of prediction
output. probability density maximum.
Our proposed noise estimation algorithm is carried by
B. Noise matrix estimation by remapping probability following three steps:
1) Re-sorting images forecasting probability value from
Large-scale sentiment dataset materials are mainly crawled small to large.
from network data. The label noise among the data samples
can be seen as the random noise [23]. We assume that the {
probability of positive images being assigned to negative is p̂(ŷ|xn+1 ) ≥ p̂(ŷ|xn )
n = 1, 2, · · · Nŷ . (7)
ρ+1 = p(ŷ = −1|y = +1) and negative images being assigned p̂(ŷ|xNŷ ) = max p̂(ŷ|x)
to positive is ρ−1 = p(ŷ = +1|y = −1). Therefore, the noise 2) Constructing a fitting function g(x) between the predic-
matrix T can be represented as: tion probability and the numbers of it:
( ) 
1 − ρ−1 ρ−1  np̂(ŷn |xn ) = gŷn (p̂(ŷn |xn ))
T = , (3) ∫ 1
ρ+1 1 − ρ+1 pŷn ∈ [0, 1] . (8)
 Np̂(ŷn |xn ) = gŷn (p̂(ŷn |xn ))
where ρ+1 , ρ−1 ∈ [0, 0.5). 0
Assume that the number of total dataset images is N , true 3) Re-positioning the noise ratio to the deviation between the
positive images is N+1 , true negative images is N−1 . With this probability of the maximum value of fitting function and ideal
noise matrix T , we can derive the numbers of labeled positive prediction probability 1:
and negative images in the dataset are N+1 ˆ = N+1 × (1 − 
ρ+1 ) + N−1 × ρ−1 and N−1 ˆ = N−1 × (1 − ρ−1 ) + N+1 × ρ+1 ,

 ρ+1 = 1 − arg max g+1 (p̂(ŷn = +1|xn ))
p
respectively. Actually, the inversed noise rates π+1 = p(y = . (9)

 −1
ρ = 1 − arg max g−1 (p̂(ŷn = −1|xn ))
−1|ŷ = +1) and π−1 = p(y = +1|ŷ = −1) are more widely p
Since label noise does not change the xi . we can denote
p(xi ) = p̂(xi ). Therefore, β can be written as [20]:

pD (xi , yi )
β(xi , ŷi ) =
pDρ (xi , ŷi )
p(xi , yi ) p(yi |xi )p(xi )
= = . (13)
p̂( xi , ŷi ) p̂(yi |xi )p̂(xi )
p(yi |xi )
=
p̂(yi |xi )
According to the Eq. (13), The β(xi , ŷi = +1) can be
simplified as:
p(yi = +1|xi )
Fig. 3. Labeled positive samples prediction probability ranking diagram β(xi , ŷi = +1) =
p̂(ŷi = +1|xi )
p(yi = +1|ŷi = +1)p̂(ŷi = +1|xi )
Fig. 3 shows the probability ranking of samples labeled =
p̂(ŷi = +1|xi )
as positive in Sentibank dataset. The black curve denotes p(yi = +1|ŷi = −1)p̂(ŷi = −1|xi ) . (14)
the fitting function and the red line denotes the probability +
p̂(ŷi = +1|xi )
corresponding to 1 − ρ+1 . Details of Sentibank dataset and π−1
the corresponding experiment setting can be found in Section = 1 − π−1 − π+1 +
p̂(ŷi = +1|xi )
III.
≥ 1 − π+1
C. Re-training with reweighting loss
Similarly, β(xi , ŷi = −1) can be written as:
Let D be the distribution of a pair of image and true label
(xi , yi ) ∈ X × Y, where X ⊆ Rd and Y = {−1, +1} denotes p(yi = −1|xi )
β(xi , ŷi = −1) =
label space. We denote by Dρ the distribution corresponding to p̂(ŷi = −1|xi )
noisy labels as (xi , ŷi ). f (xi ) denotes the classifier decision p(yi = −1|ŷi = −1)p̂(ŷi = −1|xi )
function and L(f (xi ), yi ) represents the loss function. The =
p̂(ŷi = −1|xi )
expected risk function can be expressed as [20]: p(yi = −1|ŷi = +1)p̂(ŷi = +1|xi ) , (15)
+
Rl,D (f ) = R[D, f, L] = E(xi ,yi )∼D [L(f (xi ), yi )]. (10) p̂(ŷi = −1|xi )
π+1
However, the true label data distribution D is unknown. We = 1 − π−1 − π+1 +
p̂(ŷi = −1|xi )
only have the noisy label samples distribution Dρ . Luckily,
according to [20]. The expected risk can be transformed into ≥ 1 − π−1
a function of Dρ when p̂(ŷi |xi ) = 1, Eq. (14), Eq. (15) have β(xi , ŷi ) = 1−πŷi .
Rl,D (f ) = R[D, f, L] = E(xi ,yi )∼D [L(f (xi ), yi )] Therefore, the reweighting loss defined in Eq. (12) could be
[ ] approximated as:
pD (xi , yi )
= E(xi ,ŷi )∼Dρ L(f (xi ), ŷi )
pDρ (xi , ŷi ) Lr (f (xi ), yi ) = (1 − π−1 − π+1 )l(xi , yˆi )
[ ]
1 ∑ π−1 (1 + ŷi )log p̂(ŷi = +1|xi )
pD (xi , yi ) (11) n
= R Dρ , f, L(f (xi ), ŷi ) −
pDρ (xi , ŷi ) 2N i=0 p̂(ŷi = +1|xi ) ,
= R [Dρ , f, β(xi , ŷi )L(f (xi ), ŷi )]
1 ∑ π+1 (1 − ŷi )log p̂(ŷi = −1|xi )
n
= RβL,Dρ (f ), −
2N i=0 p̂(ŷi = −1|xi )
where pDρ (xi , ŷi ) = p̂(xi , ŷi ) and pD (xi , yi ) = p(xi , yi ) (16)
represent the probability of the sample xi assigning into the Applying the global weight loss converted from noise
observed class and the true class, respectively. matrix greatly reduces the computation. However, the iterm
l(xi , ŷi ) and L(f (xi ), ŷi ) are only different forms of the
loss function. Thus, substituting pre-training softmax loss Eq. 1/p̂(xi , ŷi ) in loss is easily to emerge gradient explosion.
(1) into Eq. (11), the reweighting loss can be defined as: we use BReLU [24] as activation function to solve it. Since
the items 1/p̂(xi , ŷi ) and −logp̂(xi , ŷi ) have same gradient
1 ∑
n
Lr (f (xi ), yi ) = − β(xi , ŷi = +1)(1 + ŷi )logp̂(ŷi = +1|xi ) direction, the item 1/p̂(xi , ŷi ) only affects the convergence
2N i=0
. speed and does not change the selection characteristics of the
1 ∑
n
classifier. When the reweighting loss approaches minimum,
− β(xi , ŷi = −1)(1 − ŷi )logp̂(ŷi = −1|xi )
2N i=0 p̂(xi , ŷi ) also approaches 1. Therefore, we can further simplify
(12) the loss function by removing item 1/p̂(xi , ŷi ). Although
difficult to prove theoretically, experiments have proven its TABLE I
effectiveness. Thus, Eq. (16) can be further simpilified as: T HE NUMBER OF IMAGES PER SENTIMENT CLASS FOR S ENTIBANK AND
T WITTER “ FIVE AGREE ”

1 ∑
n
Dataset Positive Negative Total
Lr (f (xi ), yi ) = − (1 − π+1 )(1 + ŷi )log p̂(ŷi = +1|xi ) Sentibank 293150 167241 460391
2N i=0
Twitter “five agree” 501 301 802
1 ∑
n
.
− (1 − π−1 )(1 − ŷi )log p̂(ŷi = −1|xi )
2N i=0
C. Results compared with other methods
(17)
We compare the proposed method with other visual senti-
ment analysis methods as well as noisy label methods, and
III. E XPERIMENTS report the results on Twitter. The experiments are conducted
A. Datasets under three settings.
Sentibank [15]. Sentibank is a widely-used image sentiment 1) Training and testing on Twitter: The methods GCH
dataset, which contains about one-half million images from [25], LCH [25], GCH+BOW [25], LCH+BOW [25], and
Flickr with selected ANPs as queries. Each image is marked as Sentribute [5] train and test their classifiers on Twitter via
negative or positive with corresponding adjective noun pairs. 5 cross-validation. The results are shown in the first five rows
The numbers of positive and negative images are shown in in TABLE II.
TABLE I. 2) Training on other datasets and testing on Twitter: In
Twitter “five agree” [14]. Twitter “five agree” is collected this setting, Sentibank [15] and DeepSentibank [26] randomly
from Tweets, which contains 501 positive images and 301 select 90% of dataset images for training and the others for
negative images as shown in TABLE I. Each image is labeled testing to obtain the optimal classifier parameter. They use the
by at least 5 AMT workers, which guarantees the sentiment trained classifier to evaluate Twitter directly. However, the test
labels are universal. It’s widely considered as a validation set set of Sentibank also contains noisy label. In our Sentibank
for binary sentiment classification. experiments, we do not split the Sentibank datasets 90%
and 10% for training and testing, but instead use the whole
B. Implementation Details dataset as training set. We reimplement Backward corrected
Training Sentibank Details: All of Sentibank images are loss [22] and Forward corrected loss [22] under this setting.
resized such that the short side is of size 256, and the long side Here, Backward corrected loss [22] and Forward corrected
is center cropped to be of size 256. GoogLeNet is adopted as loss [22] refers to mutiplying the T −1 on softmax loss and
the Sentibank training neural network and the pre-training loss mutiplying the T on softmax loss probability, respectively. We
is softmax Eq. (1). For Pre-training and Re-training, we use the also use their recommended empirical parameters α = 0.97.
same optimization parameters. We run SGD with momentum Pre-training Sentibank experiments used the softmax as loss.
set as 0.9 and batch size is 64. The learning is first set as 0.01 Re-training Sentibank (ours) experiment used softmax loss
and is divided by 10 after 5 epoches(20 epoches in total). The corrected by our reweighting method. As shown in Tab II,
weight decay is 10−5 . our proposed reweighting method is superior to Backward
Estimating noise ratio Details: All sentibank images corrected loss [22] and Forward corrected loss [22]. Moreover,
are fed into pre-trained classifier to obtain each image’s The accuracy of our proposed reweighting method without
prediction probability. Then, the prediction probabilities of fine-tuning on Twitter also higher than most of the compared
images labeled as positive (negative) being assigned to positive methods.
(negative) class are resorted from small to the large. We use the 3) Training on other datasets, fine-tuning on Twitter, and
double Gaussian function to fit the probability ranking curve. testing on Twitter: Fine-tuned Caffenet [11] and Fine-tuned
The form of Gauss fitting function is g(x) = a1 ∗ exp(−((x − Googlenet [10] train their deep learning models on ImageNet
b1 )/c1 )2 )+a2 ∗exp(−((x−b2 )/c2 )2 ). Finally, we calculate the and then fine-tune on Twitter and report the results on Twitter.
deviation between the probability of the corresponding peak The other methods train the classifers on Sentibank and then
value of the fitting function and 1. fine-tune on Twitter and report the results on Twitter. we also
Fine-tuning Twitter “five agrees” Details: Following [14], fine-tune the Backward corrected loss [22], Forward corrected
five-fold cross-validation is adopted for the performance e- loss [22], Pre-training Sentibank, and Re-training Sentibank
valuation on Twitter “five agrees” is adopted five-fold cross- (ours) classifiers on Twitter. As shown in Tab II, The Back-
validation [14]. All Twitter “five agrees” images are also ward corrected loss [22] with fine-tuning on Twitter datasets
resized such that the short side is of size 256, and the long side has no obvious improvement when compared with original
is center cropped to be of size 256. The fine-tuned network softmax loss (Pre-training Sentibank). In this respect, our
is optimized with SGD. The momentum of the network is 0.9 proposed method shows obvious advantages. Furthermore, the
and batch size is 32. The learning rate is set to 0.001, and accuracy of the proposed reweighting method is above 90%,
divide by 10 after 6 epoches (24 epoches in total), and weight which outperforms state-of-the-art results on visual sentiment
decay is 10−5 . analysis.
TABLE II [8] Girshick, R., Donahue, J., Darrell, T., and Malik, J. “Rich feature
COMPARISION RESULTS ON T WITTER ” FIVE AGREE ” hierarchies for accurate object detection and semantic segmentation.”
Proceedings of the IEEE conference on computer vision and pattern
Method Accuracy recognition. pp. 580–587, 2014.
GCH [25] 68.4% [9] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille,
LCH [25] 71.0% A. L. “Deeplab: Semantic image segmentation with deep convolutional
GCH+BOW [25] 71.0% nets, atrous convolution, and fully connected crfs.” IEEE transactions
LCH+BOW [25] 71.7% on pattern analysis and machine intelligence 40(4), pp. 834–848, 2016.
Sentribute [5] 73.8% [10] Xu, C., Cetintas, S., Lee, K. C., and Li, L. J. “Visual Sentiment
Prediction with Deep Convolutional Neural Networks.” Eprint Arxiv
Sentibank [15] 70.9%
(2014). unpublished.
DeepSentibank [26] 77.4%
[11] Campos, V., Salvador, A., Giro-I-Nieto, X., and Jou, B. “Diving deep
2Conv+4FC [14] 78.3% into sentiment: Understanding fine-tuned cnns for visual sentiment
PCNN [14] 77.3% prediction.” Proceedings of the 1st International Workshop on Affect
DCAN(Alex) [18] 82.3% & Sentiment in Multimedia. pp. 57–62, 2015.
DCAN(Alex)+ReKL [18] 83.8% [12] Campos, Victor, Brendan Jou, and Xavier Giro-i-Nieto. “From pixels to
VGG+ft+obj+senti+If(K) [19] 88.9% sentiment: Fine-tuning cnns for visual sentiment prediction.” Image and
Fine-tuned Caffenet [11] 83.0% Vision Computing Vol. 65, pp. 15–22, 2017.
Fine-tuned Googlenet [10] 86.1% [13] Islam, Jyoti, and Y. Zhang. “Visual Sentiment Analysis for Social Images
Using Transfer Learning Approach.” IEEE International Conferences on
Backward corrected loss [22] 80.6% Big Data and Cloud Computing IEEE, pp. 124–130, October 2016.
Forward corrected loss [22] 82.0% [14] You, Q., Luo, J., Jin, H., and Yang, J. “Robust image sentiment analysis
Pre-training Sentibank 80.3% using progressively trained and domain transferred deep networks.”
Re-training Sentibank(ours) 83.4% Twenty-Ninth AAAI Conference on Artificial Intelligence AAAI Press,
Backward corrected loss [22]+Fine-tune 89.1% pp. 381–388, 2015.
Forward corrected loss [22]+Fine-tune 89.6% [15] Borth, D., Ji, R., Chen, T., Breuel, T., and Chang, S. F. “Large-scale
Pre-training Sentibank+Fine-tune 89.1% visual sentiment ontology and detectors using adjective noun pairs”. In
Re-training Sentibank+Fine-tune(ours) 90.8% Proceedings of the 21st ACM international conference on Multimedia.
pp. 223–232, October 2013.
[16] Rolnick, D., Veit, A., Belongie, S., and Shavit, N. “Deep Learning is
Robust to Massive Label Noise.” (2017). unpublished.
[17] Wu, L., Liu, S., Jian, M., Luo, J., Zhang, X., and Qi, M. “Reducing
IV. C ONCLUSION noisy labels in weakly labeled data for visual sentiment analysis.” Image
Processing (ICIP), 2017 IEEE International Conference on. pp. 1322–
To solve the problem of visual sentiment analysis with noisy 1326, September 2017.
labels, we introduce a deep learning noise estimation method. [18] Wang, J., Fu, J., Xu, Y., and Mei, T. “Beyond object recognition: visual
We show that it is not necessary to drop the unreliable labels sentiment analysis with deep coupled adjective and noun neural net-
works.” International Joint Conference on Artificial Intelligence AAAI
on dataset and we only need to use the estimated noise as the Press, pp. 3484–3490, July 2016.
weight of loss, which can reduce the dependence of visual [19] Sun, M., Yang, J., Wang, K., and Shen, H. “Discovering affective regions
sentiment analysis dataset on accurate labels. Experiments in deep convolutional neural networks for visual sentiment prediction.”
IEEE International Conference on Multimedia and Expo IEEE, pp. 1–6,
on public datasets have further shown the superiority of our July 2016.
proposed algorithm. In the future, we will extend the proposed [20] Liu, Tongliang, and D. Tao. “Classification with Noisy Labels by
method to multi-category visual sentiment analysis and give a Importance Reweighting.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 38 (3), pp. 447–461, 2014.
further rigorous proof of noise estimation theory. [21] Sugiyama, Masashi, S. Nakajima, and H. Kashima. “Direct importance
estimation with model selection and its application to covariate shift
R EFERENCES adaptation.” Advances in Neural Information Processing. pp. 1433-1440,
2008.
[1] Nass, Cliff, and Scott Brave. “Emotion in human-computer interaction.” [22] Patrini, G., Rozza, A., Menon, A. K., Nock, R., and Qu, L. “Making
The human-computer interaction handbook. CRC Press, pp.94–109, Deep Neural Networks Robust to Label Noise: A Loss Correction Ap-
2007 proach.” IEEE Conference on Computer Vision and Pattern Recognition
[2] Siersdorfer, S., Minack, E., Deng, F., and Hare, J. “Analyzing and IEEE Computer Society, pp.2233-2241, 2016.
predicting sentiment of images on the social web.” Proceedings of [23] Frnay, Benot, and Michel Verleysen. “Classification in the presence of
the 18th ACM international conference on Multimedia. pp. 715–718, label noise: a survey.” IEEE transactions on neural networks and learning
October 2010. systems 25(5), pp. 845–869, 2014.
[3] J. Machajdik and A. Hanbury, “Affective image classification using [24] Cai, B., Xu, X., Jia, K., Chunmei, Q., and Tao, D. “Dehazenet: An end-
features inspired by psychology and art theory. in ACM International to-end system for single image haze removal.” IEEE Transactions on
Conference on Multimedia, 2010, pp. 83-92. Image Processing. 25(11), pp. 5187–5198, 2016.
[4] Lu, X., Suryanarayan, P., Adams Jr, R. B., Li, J., Newman, M. G., and [25] Siersdorfer, S., Minack, E., Deng, F., and Hare, J. “Analyzing and
Wang, J. Z. “On shape and the computability of emotions.” Proceedings predicting sentiment of images on the social web.” Proceedings of the
of the 20th ACM international conference on Multimedia. pp. 229-238, 18th ACM international conference on Multimedia. pp. 715–718, 2010.
October 2012. [26] Siersdorfer, S., Minack, E., Deng, F., and Hare, J. “Deepsentibank:
[5] Yuan, J., Mcdonough, S., You, Q., and Luo, J. “Sentribute:image sen- Visual sentiment concept classification with deep convolutional neural
timent analysis from a mid-level perspective.” International Workshop networks.” arXiv preprint arXiv:1410.8586 (2014).
on Issues of Sentiment Discovery and Opinion Mining. Vol. 17, pp.1-8,
August 2013.
[6] Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T. S., and Sun, X. “Exploring
principles-of-art features for image emotion recognition.” Proceedings
of the 22nd ACM international conference on Multimedia. pp. 47-56,
November 2014.
[7] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., and Anguelov, D.
“Going deeper with convolutions.” Cvpr, pp. 1–9, 2014.

S-ar putea să vă placă și