辅导Java语言、Emotion Recognition in Context讲解、辅导Java语言、数据库编程解析、讲解Python

Emotion Recognition in Context
Abstract
Understanding what a person is experiencing from her
frame. of reference is essential in our everyday life. For this
reason, one can think that machines with this type of ability
would interact better with people. However, there are no
current systems capable of understanding in detail people’s
emotional states. Previous research on computer vision to
recognize emotions has mainly focused on analyzing the fa-
cial expression, usually classifying it into the 6 basic emo-
tions [11]. However, the context plays an important role in
emotion perception, and when the context is incorporated,
we can infer more emotional states. In this paper we present
the “Emotions in Context Database” (EMOTIC), a dataset
of images containing people in context in non-controlled
environments. In these images, people are annotated with
26 emotional categories and also with the continuous di-
mensions valence, arousal, and dominance [21]. With the
EMOTIC dataset, we trained a Convolutional Neural Net-
work model that jointly analyses the person and the whole
scene to recognize rich information about emotional states.
With this, we show the importance of considering the con-
text for recognizing people’s emotions in images, and pro-
vide a benchmark in the task of emotion recognition in vi-
sual context.
1. Introduction
Understanding how people feel plays a crucial role in
social interaction. This capacity is necessary to perceive,
anticipate and respond with care to people’s reactions. We
are remarkably skilled at this task and we regularly make
guesses about people’s emotions in our everyday life. Par-
ticularly, when we observe someone, we can estimate a lot
of information about that person’s emotional state, even
without any additional knowledge about this person. As
an example, take a look at the images of Figure 1. Let us
put ourselves in these people’s situations and try to estimate
what they feel. In Figure 1.a we can recognize an emo-
tional state of anticipation, since this person is constantly
looking at the road to correctly adapt his trajectory. We can
also recognize that this person feels excitement and that he
is engaged or absorbed with the activity he is performing.
We can also say that the overall emotion that he is feeling
is positive, he is active and he seems confident with the ac-
tivity he is performing, so he is in control of the situation.
Similar detailed estimations can be made about the people
marked with a red rectangle in the other images of Figure 1.
Recognizing people’s emotional states from images is an
active area of research among the computer vision com-
munity. Section 2 describes some of the recent works in
this topic. Overall, in the last years we observe an impres-
sive progress in recognizing the 6 basic emotions (anger,
disgust, fear, happiness, sadness, and surprise) from facial
expression. Some interesting efforts have been also in the
understanding of body language and in the use body pose
features to recognize some specific emotional states. How-
ever, in this previous research on emotion recognition, the
context of the subject is usually ignored.
Some works in psychology show evidences on the im-
portance of context in the perception of emotions [3]. In
most of the cases, when we analyze a wider view instead
of focusing on the person, we can recognize additional af-
fective information that can not be recognized if the context
is not taken into account. For example, in Figure 1(c), we
can see that the boy feels annoyance because he has to eat
an apple while the girl next to him has chocolate, which is
something that he feels yearning (strong desire) for. The
presence of the girl, the apple and the chocolate are nec-
essary clues to understand well what he indicates with his
facial expression.
In fact, if we consider the context, we can make reason-
able guesses about emotional states even when the face of
the person not visible, as illustrated in Figures 1.b and 1.d.
The person in the red rectangle of Figure 1.b is picking a
doughnut and he probably feels yearning to eat it. He is
participating in a social event with his colleagues, showing
engagement. He is feeling pleasure eating the doughnuts
and happiness for the relaxed break along with other peo-
ple. In Figure 1.d, the person is admiring the beautiful land-
scape with esteem. She seems to be enjoying the moment
(happiness), and she seems calmed and relaxed (peace). We
do not know exactly what is on the people’s minds, but we
are able to reasonably extract relevant affective information
just by looking at them in their situations.
This paper addresses the problem of recognizing emo-
tional states of people in context. The first contribution of
our work is the Emotions in Context Database (EMOTIC),
which is described in Section 2. The EMOTIC database
is a collection of images with people in their context, an-
notated according to the emotional states that an observer
can estimate from the whole situation. Specifically, images
are annotated with two complementary systems: (1) an ex-
tended list of 26 affective categories that we collected, and
(2)thecommoncontinuousdimensionsV alence,Arousal,
and Dominance [21]. The combination of these two emo-
tion representation approaches produces a detailed model
that gets closer to the richness of emotional states that hu-
mans can perceive [2].
Using the EMOTIC database, we test a Convolutional
Neural Network (CNN) model for recognizing emotions in
context. Section 4 describes the model, while Section 5
presents our experiments. From our results, we make two
interesting conclusions. First, we see that the context con-
tributes relevant information for emotional states recogni-
tion. Second, we observed that combining categories and
continuous dimensions during the training results in a more
robust system for recognizing emotional states.
2. Related work
Most of the research in computer vision to recognize
emotional states in people is contextualized in facial ex-
pression analysis (e.g.,[4, 13]). We find a large variety of
methods developed to recognize the 6 basic emotions de-
fined by the psychologists Ekman and Friesen [11]. Some
ofthesemethodsarebasedontheFacialActionCodingSys-
tem [15, 29]. This system uses a set of specific localized
movements of the face, called Action Units, to encode the
facial expression. These Action Units can be recognized
from geometric-based features and/or appearance features
extracted from face images [23, 19, 12]. Recent works for
emotion recognition based on facial expression use CNNs
to recognize the emotions and/or the Action Units [4].
Instead of recognizing emotion categories, some recent
works on facial expression [28] use the continuous dimen-
sions of the VAD Emotional State Model [21] to represent
emotions. The VAD model describes emotions using 3 nu-
merical dimensions: Valence (V), that measures how pos-
itive or pleasant an emotion is, ranging from negative to
positive; Arousal (A), that measures the agitation level of
the person, ranging from non-active / in calm to agitated /
ready to act; and Dominance (D) that measures the control
levelofthesituationbytheperson, rangingfrom submissive
/ non-control to dominant / in-control. On the other hand,
Duetal. [10]proposedasetof21facialemotioncategories,
definedasdifferentcombinationsofthebasicemotions, like
‘happily surprised’ or ‘happily disgusted’. This categoriza-
tion gives more detail about the expressed emotion.
Although most of the works in recognizing emotions are
focused on face analysis, there are a few works in computer
vision that address emotion recognition using other visual
clues apart from the face. For instance, some works [22]
consider the location of shoulders as additional information
to the face features to recognize basic emotions. More gen-
erally, Schindler et al. [27] used the body pose to recognize
the 6 basic emotions, performing experiments on a small
dataset of non-spontaneous poses acquired under controlled
conditions.
In the recent years we also observed a significant emer-
gence of affective datasets to recognize people’s emotions.
The studies [17, 18] establish the relationship between af-
fect and body posture using as ground truth the base-rate
of human observers. The data consists of a spontaneous
subset acquired under a restrictive setting (people playing
Wii games). In EMOTIW challenge [7], AFEW database
[8] focuses on emotion recognition in video frames taken
from movies and TV shows, while the HAPPEI database
[9] addresses the problem of group level emotion estima-
tion. In this work we can see a first attempt to the use
context for the problem of predicting happiness in groups
of people. Finally, the MSCOCO dataset has been recently
annotated with object attributes [24], including some feel-