Documente Academic
Documente Profesional
Documente Cultură
Received 22 October 2008; received in revised form 25 June 2010; accepted 26 July 2010
Abstract
Recent metaphor research has revealed that metaphor comprehension involves both categorization
and comparison processes. This finding has triggered the following central question: Which property
determines the choice between these two processes for metaphor comprehension? Three competing
views have been proposed to answer this question: the conventionality view (Bowdle & Gentner,
2005), aptness view (Glucksberg & Haught, 2006b), and interpretive diversity view (Utsumi, 2007);
these views, respectively, argue that vehicle conventionality, metaphor aptness, and interpretive
diversity determine the choice between the categorization and comparison processes. This article
attempts to answer the question regarding which views are plausible by using cognitive modeling
and computer simulation based on a semantic space model. In the simulation experiment, categoriza-
tion and comparison processes are modeled in a semantic space constructed by latent semantic analy-
sis. These two models receive word vectors for the constituent words of a metaphor and compute a
vector for the metaphorical meaning. The resulting vectors can be evaluated according to the degree
to which they mimic the human interpretation of the same metaphor; the maximum likelihood
estimation determines which of the two models better explains the human interpretation. The result
of the model selection is then predicted by three metaphor properties (i.e., vehicle conventionality,
aptness, and interpretive diversity) to test the three views. The simulation experiment for Japanese
metaphors demonstrates that both interpretive diversity and vehicle conventionality affect the choice
between the two processes. On the other hand, it is found that metaphor aptness does not affect this
choice. This result can be treated as computational evidence supporting the interpretive diversity and
conventionality views.
Keywords: Metaphor comprehension; Cognitive modeling; Semantic space model; Latent semantic
analysis (LSA); Categorization; Comparison; Interpretive diversity; Conventionality; Maximum
likelihood estimation
Correspondence should be sent to Akira Utsumi, Department of Informatics, The University of Electro-
Communications, 1-5-1 Chofugaoka, Chofushi, Tokyo 182-8585, Japan. E-mail: utsumi@inf.uec.ac.jp
252 A. Utsumi/Cognitive Science 35 (2011)
1. Introduction
Metaphors pervade language, both in spoken and written discourse. For example, in an
analysis of different types of discourses, Cameron (2008) demonstrated that 20 metaphors
were used per 1,000 words for college lectures, 50 metaphors in ordinary discourse, and 60
metaphors in discourses by teachers. Hence, it is not an exaggeration to say that people can-
not verbally communicate with each other without using metaphors. Furthermore, an
increasing number of studies have revealed that metaphors are essentially involved in our
everyday thought (e.g., Gibbs, 2006; Kövecses, 2002; Lakoff & Johnson, 1980).
The prevalence of metaphor in language and thought has motivated a considerable num-
ber of cognitive studies on metaphor, particularly on the cognitive mechanism of metaphor
comprehension. These studies have focused on how people comprehend metaphors and have
discovered that two different processes, namely, comparison (Gentner, 1983; Gentner,
Bowdle, Wolff, & Boronat, 2001; Gentner & Markman, 1997) and categorization
(Glucksberg, 2001; Glucksberg & Keysar, 1990), are involved in metaphor comprehension.
Therefore, recent psycholinguistic studies have explored the metaphor property that
determines the choice between the two processes and proposed different views: the
conventionality view (Bowdle & Gentner, 2005; Gentner & Bowdle, 2008), aptness view
(Glucksberg & Haught, 2006a, 2006b; Jones & Estes, 2006), and interpretive diversity view
(Utsumi, 2007); these views, respectively, argue that vehicle conventionality, metaphor
aptness, and interpretive diversity determine the choice. However, the studies that have
empirically tested these three hybrid views show different results. Researchers have hitherto
not reached a consensus, and there is a heated debate regarding the kind of metaphors that
are processed as comparisons and as categorizations (e.g., Bowdle & Gentner, 2005; Gibbs,
2008; Glucksberg & Haught, 2006a, 2006b; Jones & Estes, 2006; Utsumi, 2007).
To answer the question regarding which of these metaphor views is most plausible, this
study adopts an approach that is different from that of existing studies, namely, computa-
tional modeling and simulation. In this approach, this study employs a semantic space model
(e.g., Landauer, McNamara, Dennis, & Kintsch, 2007; Padó & Lapata, 2007; Widdows,
2004). It models two processes of metaphor comprehension (categorization and comparison)
using a semantic space model and determines which of the two models better explains the
human interpretation of each Japanese metaphor obtained in a psychological experiment
(Utsumi, 2007). The result of the model selection procedure is then predicted by three meta-
phor properties, namely, conventionality, aptness, and interpretive diversity. The best pre-
dictor can be treated as the most plausible view of metaphor comprehension that determines
a shift between comparison and categorization.
The rest of this article is organized as follows. Section 2 illustrates the comparison and
categorization views of metaphor comprehension, and then presents three hybrid views for
reconciling the categorization and comparison views, which I intend to compare in this
study. Other metaphor views are also reviewed in this framework and discussed again in
Section 5 with reference to the implications of the result of this article. Section 3 explains
the semantic space model as a computational framework used for the simulation experiment,
and two algorithms as models of the comparison and categorization processes. Furthermore,
A. Utsumi/Cognitive Science 35 (2011) 253
this section shows that these algorithms are plausible models of the psychological processes
of comparison and categorization by demonstrating that they are consistent with two
processing phenomena that the existing empirical studies of metaphor have employed to
determine the dominant process in metaphor comprehension, that is, grammatical concor-
dance between form and function and directionality in the metaphor comprehension process.
Section 4 presents the procedure for model selection and theory testing performed in the
simulation experiment and the result of the simulation experiment. Section 5 explains the
implications of the simulation results as well as some issues concerning the computational
methodology based on the semantic space model. Finally, Section 6 provides concluding
remarks for this study.
2. Metaphor theories
A metaphor consists of two concepts, which are referred to as topic and vehicle. A topic
is the concept described using a metaphor, and a vehicle is the concept employed to describe
a topic in a metaphor. For example, a metaphor ‘‘An X is a Y’’ has the topic X and the vehi-
cle Y. Note that analogy researchers often refer to the topic as the target and the vehicle as
the base (or the source).
midwife’’) are matched. In the later process of alignment, these local matches are collected
to form structurally consistent mappings, and these mappings are then merged into the com-
mon structure. Next, the elements connected to the common structure in the vehicle but not
initially present in the topic (e.g., the predicates specifying the gradual development of the
child within the mother) are projected as candidate inferences into the topic. Structural
alignment and inference projection constitute a process of comparison, which Gentner et al.
propose as a general cognitive mechanism for analogy, metaphor, and similarity.
The basic concept of the structure mapping theory is shared by dominant theories of
analogy (e.g., Holyoak & Thagard, 1989; Hummel & Holyoak, 1997; Larkey & Love,
2003). They accept the process of comparison, comprising alignment and projection, as a
mechanism of metaphorical mapping. (In particular, Hummel and Holyoak’s [1997] Learn-
ing and Inference with Schemas and Analogies [LISA] is basically the structure mapping
view.) The difference among these theories lies primarily in the kind of similarities that are
preferentially included in the common structure. The structure mapping theory emphasizes
relational similarities, particularly similarities between higher order relations according to
the systematicity principle. On the other hand, Holyoak and Thagard’s (1989) Analogical
Constraint Mapping Engine (ACME) and Hummel and Holyoak’s (1997) LISA argue that
semantic and pragmatic similarities are also required in analogical mapping. Despite such
differences, these theories of analogy can be classified into the comparison view (for the
same treatment, see Gentner & Bowdle, 2008).
The conceptual metaphor theory (Clausner & Croft, 1997; Grady, 1997; Kövecses, 2002;
Lakoff & Johnson, 1980, 1999) is related to the comparison view. The basic tenet of the
conceptual metaphor theory is that our conceptual system is structured by preexisting cross-
domain mappings, the so-called conceptual metaphors, which are grounded in embodied
experiences. Verbal metaphors (i.e., metaphorical expressions) are assumed to be compre-
hended merely on the basis of such conceptual metaphors. Conceptual metaphors differ in
the way they are experientially grounded; Grady (1997, 2005) distinguished primary meta-
phors (e.g., Happy Is Up)—directly motivated by experiential correlation—from complex
metaphors (e.g., Theories Are Buildings), which do not appear to be directly motivated by
experiential correlation but are constructed by combining primary metaphors. Although pri-
mary metaphors are embodied and not based on analogical mappings, some complex meta-
phors are based on analogical mappings (Grady, 2005). Hence, the conceptual metaphor
theory can be regarded as relevant to the comparison view, rather than to the categorization
view. Note that such an embodied view of metaphors leads to the critical argument that met-
aphor comprehension cannot be simulated using non embodied computational methods,
such as the semantic space model (e.g., Louwerse & Van Peer, 2009), which appears to be
incompatible with this study. This issue will be discussed in Section 5.2.
different roles in this comprehension process; the vehicle provides a superordinate category
that can be used to characterize the topic, whereas the topic constrains the dimensions by
which it can be characterized. For example, the metaphor ‘‘My job is a jail’’ is compre-
hended so that the topic my job is categorized as an ad hoc category like ‘‘unpleasant and
confining things’’ to which the vehicle jail typically belongs. In evoking the ad hoc cate-
gory, my job facilitates the attribution of features related to tasks and jobs, while blocking
out irrelevant features such as those related to jail building.
The recent development of Sperber and Wilson’s (1995) relevance theory adopts a view
of metaphor comprehension that is similar to the categorization view. Relevance theory
argues that metaphor comprehension can be regarded as the online construction of ad hoc
concepts by broadening or narrowing lexical (or literal) meanings of the vehicle (Carston,
2002; Sperber & Wilson, 2008). The role of the topic assumed in relevance theory differs
from Glucksberg’s attributive categorization view. Relevance theory assumes that the topic
influences the process of concept construction through pragmatic inferencing according to
the principle of relevance, rather than restricting the dimensions along which features are
mapped. Despite such a difference, the relevance-theoretic view of metaphor is very similar
to the categorization view; thus, it can be reasonably classified as the categorization view
(Carston, 2002).
The comparison and categorization views have their respective limitations and advanta-
ges; this has led metaphor research to reconcile these two opposite views. Thus, recent
studies (e.g., Bowdle & Gentner, 2005; Glucksberg & Haught, 2006b; Jones & Estes,
2005, 2006; Utsumi, 2007) have provoked a heated debate on how these two views can be
256 A. Utsumi/Cognitive Science 35 (2011)
reconciled; that is, they debate over the metaphor properties that determine the choice of
comprehension strategy between categorization and comparison. Three different views,
namely, the conventionality view, aptness view, and interpretive diversity view, have been
proposed for reconciling the categorization and comparison views; these three views are
summarized in Table 1. In this section, using the metaphor examples listed in Table 2,
I explain how these views predict the comprehension process.
Table 1
Comparison of three hybrid metaphor views that attempt to reconcile the categorization view and the comparison
view
Initial Alternative What Kind of Metaphor
Metaphor View Process Process Should Activate the Alternative?
Conventionality view Comparison Categorization Conventional metaphors (metaphors referring
(Bowdle & Gentner, 2005) to a lexically encoded metaphoric category
that can be attributed to the topic)
Aptness view Categorization Comparison Less apt metaphors (metaphors that cannot
(Glucksberg & Haught, evoke any metaphoric categories relevant to
2006a, 2006b) the important feature of the topic)
Interpretive diversity Categorization Comparison Less diverse metaphors (metaphors that
view (Utsumi, 2007) cannot evoke any rich metaphoric
categories for the topic)
Note. Each view predicts that the comprehension of all metaphors starts with the initial process (the second
column), but a specific kind of metaphor (the last column) is comprehended later by the alternative process (the
third column).
Table 2
Examples of metaphors showing how the three metaphor views make the same or different predictions on the
comprehension process
Conventionality View Aptness View Interpretive Diversity View
Predicted Predicted Predicted
Metaphor Example Conventionality Process Aptness Process Diversity Process
My job is a jail Conventional Categorization Apt Categorization Diverse Categorization
A gene is a blueprint Conventional Categorization Apt Categorization Less diverse Comparison
My memories are money Conventional Categorization Less apt Comparison Diverse Categorization
Birds are airplanes Conventional Categorization Less apt Comparison Less diverse Comparison
A goalie is a spider Novel Comparison Apt Categorization Diverse Categorization
That supermodel is a rail Novel Comparison Apt Categorization Less diverse Comparison
A child is a snowflake Novel Comparison Less apt Comparison Diverse Categorization
A fisherman is a spider Novel Comparison Less apt Comparison Less diverse Comparison
Note. Each metaphor example can be characterized by the distinctive properties (i.e., conventionality,
aptness, diversity) of three metaphor views, and its comprehension process is predicted on the basis of the
characterized properties.
A. Utsumi/Cognitive Science 35 (2011) 257
hoc category evokes many equally salient meanings (e.g., ‘‘A child is unique, delicate, and
likely to change the atmosphere just as snowfall changes the landscape’’). Similarly, some
less apt metaphors such as ‘‘My memories are money’’ may be interpretively diverse
because a number of less relevant but equally salient meanings are evoked from the vehicle
money as a potential metaphorical meaning (e.g., ‘‘My memories are precious to me,’’
‘‘I keep my memories in the soul so as not to miss it,’’ and ‘‘I cannot live without mem-
ory’’). Therefore, all these metaphors are predicted to be comprehended as categorizations.
On the other hand, metaphors are interpretively less diverse when the vehicle cannot
evoke a metaphoric category with many potential features, regardless of whether they are
conventional or whether they are apt. For example, the metaphor ‘‘A fisherman is a spider’’
is interpretively much less diverse because the vehicle spider cannot evoke any rich cate-
gories for the topic fisherman, although ‘‘A goalie is a spider’’ may imply diverse meanings.
For the same reason, some apt metaphors (e.g., ‘‘That supermodel is a rail’’) are also less
diverse. These less diverse metaphors can be reinterpreted via the comparison process. The
interpretive diversity view also predicts that even conventional metaphors (e.g., ‘‘A gene is a
blueprint’’) are comprehended via the comparison process if the conventional metaphoric
categories associated with the vehicle is semantically less rich (or semantically ‘‘narrow’’).
These three hybrid views have empirically demonstrated the superiority of their own
views. In these experiments, the metaphor–simile distinction was used as a valuable tool for
examining the use of comparison and categorization during metaphor comprehension. The
basic assumption underlying this method is that the linguistic form of a figurative statement
invokes a specific comprehension process. Metaphors of the form ‘‘An X is a Y’’ should
invite categorization because they are grammatically identical to literal categorization state-
ments, whereas similes of the form ‘‘An X is like a Y’’ should invite comparison because
they are grammatically identical to literal comparison statements. Therefore, if the process
initially invoked by the form is different from the process eventually used for comprehen-
sion, such figurative statements should be reinterpreted; thus, such statements are compre-
hended more slowly and are less preferred. Following Bowdle and Gentner (2005), I refer to
this link between form and process as grammatical concordance.
Bowdle and Gentner’s (2005) conventionality view predicts that novel topic–vehicle
pairs should be comprehended as comparisons, and according to grammatical concordance,
it follows that they should be more comprehensible when presented in the simile form ‘‘An
X is like a Y’’ than in the metaphor form ‘‘An X is a Y.’’ This is because the metaphor form
initially invites an inappropriate process of categorization, whereas similes are compre-
hended as comparisons from the very beginning. In contrast, if topic–vehicle pairs are con-
ventional, both forms should be equally comprehensible. Bowdle and Gentner (2005)
demonstrated that the experimental results were consistent with this prediction; moreover,
they showed that these results could not be explained in terms of metaphor aptness.
On the other hand, Glucksberg and Haught (2006b) demonstrated that novel but apt figu-
rative statements (e.g., ‘‘My lawyer is (like) a well-paid shark’’) were easier to comprehend
260 A. Utsumi/Cognitive Science 35 (2011)
in the metaphor form than in the simile form. This finding is obviously inconsistent with the
prediction of the conventionality view, and therefore, they concluded that the aptness or the
quality of metaphors determines the choice of comprehension strategy. Furthermore, Jones
and Estes (2005, 2006) reported that apt metaphors were more likely to be processed as
categorizations and were comprehended faster than less apt metaphors; however, no such
differences were observed between novel and conventional metaphors.
Against these two hybrid views, Utsumi (2007) demonstrated that only the interpretive
diversity of topic–vehicle pairs was positively correlated with the relative comprehensibility
of the metaphor form, as compared with the simile form; however, neither vehicle conven-
tionality nor metaphor aptness showed a correlation with the relative comprehensibility.
Although diverse pairs were equally comprehensible in both forms, less diverse pairs were
more comprehensible when presented in the simile form than in the metaphor form. In
addition, less diverse metaphors shared more meanings with the corresponding similes
than diverse metaphors, suggesting that less diverse metaphors and similes are likely to be
understood by the same process, namely, a comparison process. Again, such a difference
was not observed for either vehicle conventionality or aptness.
As I have described earlier, recent metaphor studies have provoked a heated debate with
regard to which metaphor properties determine the choice between categorization and com-
parison processes for comprehending metaphors. However, the question of determining
which view is the most plausible remains unresolved. This study thus employs a different
approach to this issue, namely, computational modeling and simulation experiment. I
attempt to provide a computational or theoretical solution to the problem by identifying the
metaphor property from vehicle conventionality, metaphor aptness, and interpretive diver-
sity that best explains the result of model selection between comparison and categorization
models. Given the lack of metaphor research using a model comparison technique, this
study can be regarded as a pioneering study that derives evidence or knowledge about the
mechanism of metaphor comprehension through a computational method.
3. Computational model
Recently, vector-based semantic space models have been frequently used to represent
lexical meanings and have proved highly useful for a variety of natural language processing
(NLP) tasks, such as word sense disambiguation (Schütze, 1998), information retrieval
(Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990; Widdows, 2004), thesaurus
construction (Lin, 1998), document clustering (Shahnaz, Berry, Pauca, & Plemmons, 2006),
and essay scoring (Landauer, Laham, & Foltz, 2003). What is more important is that
semantic space models have also provided a useful framework for cognitive modeling, for
example, similarity judgment (Landauer & Dumais, 1997), semantic priming (Jones,
Kintsch, & Mewhort, 2006; Lowe & McDonald, 2000), text comprehension (Foltz, Kintsch,
& Landauer, 1998; Kintsch, 2001), and language-mediated eye movement (Huettig,
A. Utsumi/Cognitive Science 35 (2011) 261
Quinlan, McDonald, & Altmann, 2006). There are also good reasons for using semantic
space models for cognitive modeling and NLP. First, semantic space models are cost-
effective in that it takes lesser time and effort to construct large-scale geometric
representations of word meanings than to construct other types of lexical knowledge, such
as dictionaries or thesauri. Second, they can represent the implicit knowledge of word mean-
ings that dictionaries and thesauri cannot. Lastly, semantic spaces are easy to revise and
extend.
Semantic space models are based on two main assumptions. One assumption is that the
meaning of each word wi can be represented by a high-dimensional vector v(wi) ¼
(wi1,wi2,…,wiD), that is, a word vector. These D real-valued components define the lexical
meaning of the word. The second assumption is that the degree of semantic similarity
sim(wi,wj) between any two words wi and wj can be computed using the similarity function
of their word vectors. Among the variety of functions that can be used to compute the simi-
larity between two word vectors in semantic space models, the cosine cos (v(wi),v(wj)) is
the most widely used. Using the similarity measure, one can easily compute the degree to
which two words are semantically related.
Semantic spaces (or word vectors) are constructed from large bodies of text by observing
distributional statistics of word occurrence. The method for constructing word vectors
generally comprises the following two steps. First, M content words in a given corpus are
represented as R-dimensional initial word vectors, and a M by R matrix A is constructed
using M word vectors as rows.2 Then, the dimension of A’s row vectors is reduced from the
initial dimension R to D and, as a result, a D-dimensional semantic space including M words
is generated.
Numerous methods have been proposed for computing initial word vectors and for reduc-
ing the dimensionality (for an overview, see Padó & Lapata, 2007). Among them, latent
semantic analysis (henceforth, LSA; Landauer & Dumais, 1997; Landauer et al., 2007) is
the most popular. LSA uses the frequency of words in a document (e.g., paragraph) to com-
pute initial vectors, whose dimension R is equal to the number of documents.3 LSA then
reduces the number of dimensions using singular value decomposition (henceforth, SVD).
Many studies (e.g., Kintsch, 2001; Landauer & Dumais, 1997; Landauer et al., 2007) have
demonstrated that LSA successfully mimics a variety of human behaviors, particularly those
associated with semantic processing. Hence, this study uses LSA to construct a semantic
space for computer simulation.
the other hand, the metaphorical category of contagious things does not include diseases
literally caused by a virus, such as influenza and tuberculosis, and they will be excluded
from the set of k neighbors because they are not relevant to rumor (A). Glucksberg’s
categorization theory also argues that literal categorization statements can be comprehended
in the same way as metaphorical assertions. The word virus in the literal statement ‘‘Influ-
enza is a virus’’ refers to the literal contagious virus, and the literal category can be repre-
sented as the set of k neighbors; in this case, the set of k neighbors can include meanings
that are not relevant to the metaphorical category, because the argument influenza is also a
virus. It must be noted that Kintsch (2000) briefly describes the relationship between the
predication algorithm and Glucksberg’s categorization theory. He suggests that the predica-
tion algorithm is consistent with the categorization theory, although he does not argue that it
can be considered as a computational model of categorization. Furthermore, Glucksberg
(2003) points out that the predication algorithm is very similar to the categorization process.
Let M be a given nominal metaphor with the vehicle wV (i.e., predicate) and the topic wT
(i.e., argument), and Ni(x) be a set of i neighbors of the word x (i.e., a set of words with i
highest similarity to x). The algorithm Categ(v(wT),v(wV);hcat) of computing a metaphor
vector vcat(M) for M by the process of categorization is given as follows: (Note here that the
algorithm Categ is identical to Kintsch’s [2001] predication algorithm.)
Categ(v(wT),v(wV);hcat)
1. Compute Nm(wV), that is, m neighbors of the vehicle wV.
2. Choose k words with the highest similarity to the topic wT from among Nm(wV).
3. Compute a vector vcat(M) as the centroid of v(wT), v(wV), and k vectors of the words
chosen in Step 2.
The parameter m denotes the number of vehicle neighbors that should be searched for in
the algorithm and the parameter k denotes the number of vehicle neighbors that should be
selected to be similar to the topic. The notation hcat denotes a list of these parameter values
(m, k) for the Categ algorithm.
For example, Table 3 shows the step-by-step behavior of the Categ algorithm when it
computes a vector for the metaphor ‘‘A rumor is a virus’’ with the parameters hcat ¼
(m,k) ¼ (500,5). The first column of Table 3 lists the top 10 and last 2 neighbors of the
vehicle virus included in the set N500(wV) computed at Step 1.5 (Hence, the words capsule
and estimation are the 499th and the 500th nearest neighbors.) These 500 neighbors are
sorted in descending order of cosine similarity to the topic rumor and are listed in the second
column. As k ¼ 5, the top five words (i.e., doubt, trigger, recency, spread/get about, and
guess) are chosen for characterizing a metaphoric category of ‘‘contagious things.’’ (Note
that vehicle neighbors are generally not the nearest neighbors of the topic, although they
move closer to the topic as m grows larger. For example, the top five nearest neighbors of
the topic rumor are disclosure, prosecutor, expose, report, and conjecture.) These chosen
words do not seem to represent the metaphoric category of contagious things on their own,
but their centroid vector is close to the features of contagious things. In Step 3, these five
vectors of the chosen words are averaged with the topic and the vehicle vectors to obtain a
264 A. Utsumi/Cognitive Science 35 (2011)
Table 3.
An example of the step-by-step behavior of the Categ algorithm in comprehending the metaphor ‘‘A rumor is a
virus’’
Nearest Neighbors of the Sorted List of the 500 Vehicle Nearest Neighbors of the
Vehicle virus and Cosines Neighbors and Cosines with the Metaphor Vector and Their
with the Vehicle (Step 1) Topic (Step 2) Cosines (Step 3)
contagion 0.94 doubt 0.28 recency 0.86
fungus 0.86 trigger 0.19 virus 0.63
tolerance 0.81 recency 0.18 epidemic/spread b 0.63
disease onset 0.80 spread/get about a 0.18 contagion 0.60
tuberculosis 0.80 guess 0.17 fungus 0.59
bacteria 0.80 ovulation 0.17 take effect 0.57
antibiotic 0.77 sneeze 0.17 bacteria 0.57
heated 0.75 pregnancy 0.14 tolerance 0.55
drug disaster 0.75 efficacy 0.14 of a kind 0.55
blood sampling. 0.75 appearance .. 0.14 disease onset . 0.55
.. . ..
capsule 0.18 trachea )0.11
estimation 0.18 pulse )0.11
a
The original Japanese word hiromaru means both ‘‘spread’’ and ‘‘get about.’’
b
The original Japanese word ryuko means both ‘‘epidemic’’ and ‘‘spread.’’
metaphor vector vcat(M). The rightmost column of Table 3 lists the top 10 nearest neighbors
of the metaphor vector. They can be regarded as representing the meanings of the metaphor
because, as mentioned in Section 3.1, the cosine similarity between two vectors is used as a
measure of semantic relatedness. Some nearest neighbors of the vehicle (i.e., contagion and
take effect), which are also relevant to the topic, are attributed to the topic rumor. On the
other hand, some nearest neighbors of the vehicle that are not relevant to the topic, such as
tuberculosis, are downplayed. More important, some emergent words, such as recency and
epidemic/spread, which are not close to the vehicle or the topic, but appropriate as a meta-
phorical meaning, are also attributed to the topic; a rumor spreads rapidly, just like a virus is
epidemic, and recency is an important factor for the spread of both a virus and a rumor.
Compa(v(wT),v(wV);hcom)
1. Compute k common neighbors Ni(wT) \ Ni(wV) of wT and wV by finding the smallest i
that satisfies |Ni(wT) \ Ni(wV)| ‡ k.
2. Compute a metaphor vector vcom(M) as the centroid of v(wT) and k vectors of the
words chosen in Step 1.
A. Utsumi/Cognitive Science 35 (2011) 265
Table 4 shows an example of how the Compa algorithm works for the metaphor ‘‘A
fisherman is a spider’’ with the parameter hcom ¼ (k) ¼ (3). First, the algorithm finds k
common neighbors of the topic fisherman and the vehicle spider. In this example, the first
common neighbor prey/bait is found when i ¼ 23 (in other words, Ni(wT) \ Ni(wV) first
becomes non-empty when i ¼ 23). As a result, three common neighbors prey/bait, net, and
fishing are found when i ¼ 67. These common neighbors represent a correspondence
between fisherman and spider, in that just as a spider waits for and catches its prey in a net
(web), a fisherman waits for and catches fish (as prey) using a net. (Note that the common
term fishing means the act of catching fish.) Then in Step 2, the vectors of these common
neighbors are averaged with the topic vector such that the resulting metaphor vector is close
to both the original topic vector and common neighbors. As shown in Table 4, the resulting
metaphor vector is indeed close to the three common vectors, as well as the topic fisherman
itself and some of the topic properties, such as fishery and catch landing, which are included
in the top 10 nearest neighbors of the metaphor vector. In particular, the common neighbors
are highlighted, because their cosines with the metaphor vector are higher than those with
the original topic vector. Furthermore, for example, the term wait, which is not initially
salient for fisherman, is also highlighted, although it is not ranked among the top 10 nearest
neighbors; wait has a higher cosine with the metaphor vector (0.35) than with fisherman
(0.24). Taken together, these results mean that the Compa algorithm produces an appropri-
ate metaphor vector that represents an intuitively sensible interpretation that the fisherman’s
Table 4
An example of the step-by-step behavior of the Compa algorithm in comprehending the
metaphor ‘‘A fisherman is a spider’’
Common Neighbors of
fisherman and spider Cosine with fisherman Cosine with spider and
Computed at Step 1 and Its Rank in Parentheses Its Rank in Parentheses
prey/bait a 0.55 (18) 0.31 (23)
net 0.56 (16) 0.28 (58)
fishing 0.46 (67) 0.30 (26)
Top 10 Nearest Neighbors of the Metaphor Cosine with the
Vector Computed at Step 2 Metaphor Vector
prey/bait 0.92
net 0.85
small fish 0.79
fishing 0.79
fishery 0.76
water temperature 0.75
fisherman 0.75
angler 0.72
migration 0.71
catch landing 0.70
a
The original Japanese word esa means both ‘‘prey’’ and ‘‘bait.’’
266 A. Utsumi/Cognitive Science 35 (2011)
specific property of ‘‘waiting for and catching fish using a net’’ is emphasized by this
metaphor.
This algorithm can be regarded as a simplified model of the comparison process that com-
prises alignment and projection (Gentner, 1983; Gentner et al., 2001). The computation of
common neighbors Ni(wT) \ Ni(wV) of topic and vehicle at Step 1 can be reasonably
regarded as the alignment process. It is likely that the set of common neighbors includes
identical elements found in the early stage of alignment (e.g., the arguments net and prey in
the case of ‘‘A fisherman is a spider’’ and the predicates bring and help in the case of
‘‘Socrates is a midwife’’). It must be noted that according to Gentner (1983), the alignment
process is governed by the systematicity principle: a system of relations connected by higher
order relations is preferred over one with an equal number of independent matches. The
Compa algorithm does not explicitly take into account the later stage of alignment in which
structurally consistent mappings are derived from local matches according to the systematic-
ity principle; however, it implicitly deals with some aspects of this process. Predicates for
higher order relations are likely to be expressed by ambiguous words, and such ambiguous
words are likely to be common neighbors because they are similar to many words in a
semantic space. These predicates constitute consistent mappings, and as a result, the Compa
algorithm seems to prefer higher order relations in the alignment process.
Of course, I do not argue that the Compa algorithm completely embodies the later stage
of alignment (and the systematicity principle). This limitation is common to any algorithm
in semantic space models, rather than being specifically meant for the Compa algorithm,
because semantic space models at present lack the ability to represent the relational knowl-
edge of concepts expressed by words (Kintsch, 2008a). However, I do not consider this to
be a serious limitation of the Compa algorithm for the present purpose because most empiri-
cal findings regarding the debate between comparison and categorization (e.g., Bowdle &
Gentner, 2005; Chiappe & Kennedy, 1999; Glucksberg & Haught, 2006b; Jones & Estes,
2006; Utsumi, 2007; Wolff & Gentner, 2000) have been obtained for simple nominal meta-
phors, and understanding these metaphors does not require so many alignments of higher
order relations.
On the other hand, the computation of the centroid of k common neighbors and the topic
vector in Step 2 can be reasonably regarded as the projection process. Projection is a process
of transferring to the topic predicates and arguments connected to the common structure
found in the alignment process. As a result, projected predicates and arguments that are
included in the aligned structure (i.e., those common to both concepts or unique to the vehi-
cle) are highlighted, whereas the original salient properties of the topic are retained (e.g.,
Gentner & Bowdle, 2008; Gentner et al., 2001). This can be modeled as the centroid com-
putation of the vectors of k common neighbors and the topic vector, because the centroid of
multiple vectors is generally close to those vectors. It implies that the resulting centroid vec-
tor is close to the elements in the aligned structure (represented by k common neighbors), as
well as to the salient properties of the topic (represented by the topic vector). In fact, as
shown in the fisherman–spider example, the elements connected to the common structure,
that is, net and prey, which are common to both concepts, and wait, which are unique to the
vehicle but not initially salient in the topic, are highlighted by the Compa algorithm. In
A. Utsumi/Cognitive Science 35 (2011) 267
addition, the original salient properties of the topic, such as fishery, are also included in the
nearest neighbors of the metaphor vector.
Before presenting the results of the simulation experiment, I must demonstrate the plausi-
bility of the two algorithms Categ and Compa as models of categorization and comparison
processes, so that the simulation result constitutes a valid test of metaphor theories. In this
regard, to demonstrate the empirical adequacy of the model (McClelland, 2009), I show that
these algorithms are consistent with the following two distinctive processing phenomena:
1. Grammatical concordance between form and function (Bowdle & Gentner, 2005;
Chiappe & Kennedy, 1999; Gentner & Bowdle, 2008; Glucksberg & Haught, 2006b;
Utsumi, 2007).
2. Directionality in the metaphor comprehension process (Gentner & Wolff, 1997;
Wolff & Gentner, 2000).
algorithms can be tested by examining whether and when these algorithms can yield the
result that is consistent with the asymmetry of metaphors.
Cosine
0 0.1 0.2 0.3 0.4 0.5 0.6
animal nature
Mammalian features
suckle
swim Whale
Whale-specific features
Categ algorithm
sea Compa algorithm
Fig. 1. An illustrative example showing that the Categ algorithm generates a more plausible vector than the
Compa algorithm for a literal categorization statement: The case of ‘‘A whale is a mammal.’’ The bar chart
shows the cosine similarity between the original vector of whale and the vectors for the relevant landmark fea-
tures. Two line graphs denote the cosine similarity between the vectors vcat(S) or vcom(S) for the literal categori-
zation statement ‘‘A whale is a mammal’’ and the landmark features. It is preferable that mammalian features
are more highlighted (i.e., graphs are on the right of the bars) and whale-specific features are more downplayed
(i.e., graphs are on the left of the bars). The Categ algorithm increases the cosine similarity of the mammalian
features and decreases the cosine similarity of the whale-specific features to a greater extent than the Compa
algorithm.
A. Utsumi/Cognitive Science 35 (2011) 269
the algorithm Compa, as shown in Fig. 1. First, although the typical mammalian features
animal nature and suckle are highlighted by both algorithms, the Categ algorithm highlights
these mammalian features to a greater extent than the Compa algorithm; the increase in
cosine similarity from the original whale vector (denoted by the gray bars) is greater for the
sentence vector vcat(S) computed by the Categ algorithm (denoted by filled circles) than for
the sentence vector vcom(S) computed by the Compa algorithm (denoted by filled triangles).
The mean increase in cosine similarity for the two mammalian features is 0.27 for the vector
vcat(S) and 0.17 for the vector vcom(S). Second, whale-specific but non mammalian features
such as sea and swim are downplayed by the Categ algorithm, but they are not downplayed
(or somewhat highlighted) by the Compa algorithm. The mean decrease in cosine similarity
from the original whale vector is 0.08 for the vector vcat(S) by Categ, and it is greater than
the mean decrease )0.01 for the vector vcom(S) by Compa. Finally, owing to these differ-
ences, the sentence vector vcat(S) of ‘‘A whale is a mammal’’ by the Categ algorithm
behaves like a mammal more appropriately than the vector vcom(S) by the Compa algorithm;
the vector vcat(S) by Categ has higher cosine with the mammalian features than with the
whale-specific features, but the vector vcom(S) by Compa inappropriately has higher cosine
with the whale-specific feature sea than with the mammalian feature suckle. From these sim-
ulation results, this example indicates that the algorithm Categ works better as a model of
categorization than the algorithm Compa.7
In contrast, it is reasonably assumed that people comprehend a literal comparison state-
ment ‘‘An X is like a Y,’’ such that only the common features shared by X and Y are high-
lighted without other Y-ness features being highlighted. For example, in comprehending ‘‘A
whale is like a dolphin,’’ people would try to seek commonality between whale and dolphin
and arrive at the interpretation in which common features (e.g., living in the sea, swimming)
are emphasized but dolphin-specific features (e.g., therapy, intelligence) are not highlighted.
Fig. 2 shows that such a pattern of interpretation can be replicated more appropriately
Cosine
0 0.1 0.2 0.3 0.4
sea
Common features
swim
therapy Whale
Categ algorithm
intelligent Dolphin-specific features Compa algorithm
Fig. 2. An illustrative example showing that the Compa algorithm generates a more plausible vector than the
Categ algorithm for a literal comparison statement: The case of ‘‘A whale is like a dolphin.’’ The bar chart
shows the cosine similarity between the original vector of whale and the vectors for the relevant landmark fea-
tures. Two line graphs denote the cosine similarity between the vectors vcat(S) or vcom(S) for the literal categori-
zation statement ‘‘A whale is like a dolphin’’ and the landmark features. It is preferable that the common
features are more highlighted (i.e., graphs are on the right of the bars) and dolphin-specific features are not high-
lighted (i.e., graphs are located near the bars). The Compa algorithm increases the cosine similarity of the com-
mon features to a greater extent and the cosine similarity of the dolphin-specific features to a lesser extent than
the Categ algorithm.
270 A. Utsumi/Cognitive Science 35 (2011)
by the algorithm Compa than by the algorithm Categ. (Note that the bar chart of Fig. 2
represents the cosine between the whale vector and feature vectors, and two graphs repre-
sent the cosines between the sentence vector vcat(S) or vcom(S) for the literal comparison
statement (S) ‘‘A whale is like a dolphin’’ and feature vectors.) First, the common features
sea and swim are highlighted by the Compa algorithm (i.e., they are closer to the sentence
vector vcom(S) than to the original whale vector), but the Categ algorithm undesirably down-
plays the common feature sea. Moreover, the increase in the cosine similarity of the feature
swim is greater for the Compa algorithm than for the Categ algorithm. Second, the unshared
dolphin-specific features therapy and intelligent are less highlighted by the Compa algo-
rithm than by the Categ algorithm. The mean increase in cosine similarity of two dolphin-
specific features is 0.06 for the vector vcom(S) and is smaller than the mean increase of 0.21
for the vector vcat(S). Finally, as a result, the Compa algorithm generates an intuitively plau-
sible sentence vector that is more similar (and thus, semantically more related) to the com-
mon features than to the dolphin-specific features. The Categ algorithm, however, does not
generate such a plausible vector; the generated sentence vector is less similar to the common
features than to the dolphin-specific feature therapy. These simulation results indicate that
the Compa algorithm works better as a model of comparison than the Categ algorithm.
Table 5
Asymmetry between the metaphor ‘‘A rumor is a virus’’ and its reversed metaphor ‘‘A virus is a
rumor’’ generated by the Categ algorithm
contagion scandal
Cosine DCosine Cosine DCosine
Rumor 0.00 – 0.32 –
Virus 0.94 – 0.12 –
A rumor is a virus 0.60 0.60 0.10 )0.22
A virus is a rumor 0.52 )0.42 0.31 0.19
Notes. Cosine denotes the cosine similarity to two landmarks contagion and scandal. DCo-
sine denotes an increase in cosine similarity by metaphorization, which is equal to (Cosine of the
metaphor) ) (Cosine of the topic alone).
A. Utsumi/Cognitive Science 35 (2011) 271
Table 6
Asymmetry between the metaphor ‘‘Deserts are ovens’’ and its reversed metaphor ‘‘Ovens are
deserts’’ generated by the Compa algorithm
dry vast dish
Cosine DCosine Cosine DCosine Cosine DCosine
Deserts 0.28 – 0.56 – )0.02 –
Ovens 0.35 – )0.02 – 0.70 –
Deserts are ovens 0.78 0.50 0.41 )0.15 0.29 0.31
Ovens are deserts 0.78 0.43 0.37 0.39 0.32 )0.38
Notes. Cosine denotes the cosine similarity to the three landmarks. DCosine denotes an
increase in cosine similarity by metaphorization, which is equal to (Cosine of the metaphor)
) (Cosine of the topic alone).
virus’’ than with rumor, which is shown by DCosine ¼ )0.22). On the other hand, the
vector for ‘‘A virus is a rumor’’ shows a different result; it highlights the feature scandal
(DCosine ¼ 0.19) and downplays the feature contagion (DCosine ¼ )0.42). Although the
cosine similarity of contagion is still higher than the cosine of scandal, it may reflect the
intuition that the reversed metaphor ‘‘A virus is a rumor’’ does not make sense. This mean-
ingless metaphor cannot appropriately describe the relevant features of a virus, and thus, the
originally salient features of a virus may be still salient in the metaphor. Likewise, Table 6
shows that the Compa algorithm (with the parameter hcom ¼ (k) ¼ (3)) reflects the asymme-
try of the metaphor; two metaphor vectors differ in that the vector for ‘‘Deserts are ovens’’
highlights the two features dry and dish, whereas the vector for ‘‘Ovens are deserts’’ high-
lights different features dry and vast. Note that the Compa algorithm may highlight common
properties (e.g., dry in this case) shared by the topic and the vehicle regardless of their order.
This tendency is consistent with the existing findings that reversed similes (i.e., reversed
figurative comparisons) preserved the original interpretation better and lowered the
comprehensibility to a lesser extent than the reversed metaphors (Chiappe, Kennedy, &
Smykowski, 2003).
With regard to when the algorithms Categ and Compa generate the asymmetry of meta-
phors, they differ in the stage at which directionality arises during computation, just as the
categorization and comparison processes differ as to when the asymmetry appears during
metaphor comprehension. In general, the algorithm Categ is initially asymmetrical in the
same way as the categorization process. The first step (Step 1) of the algorithm Categ com-
putes a set of m neighbors of the vehicle, that is, the set of Y’s neighbors for the original
metaphor ‘‘An X is a Y,’’ and the set of X’s neighbors for the reversed metaphor ‘‘A Y is
an X.’’ It is very likely that these two sets are not only different but also have quite a small
overlap, unless X and Y have very similar vectors. Particularly in the case of metaphors, X
and Y are not semantically similar, and thus, it seems much less likely that two sets of
neighbors would be identical; in many cases, they do not even overlap (especially when the
number of vehicle neighbors m is small). As a result, the second step (Step 2) chooses a dif-
ferent set of k neighbors between the original metaphor and its reversed metaphor. For
example, Table 7 lists 20 neighbors of the vehicle for the metaphors ‘‘A rumor is a virus’’
272 A. Utsumi/Cognitive Science 35 (2011)
Table 7
Twenty neighbors of the vehicle computed at Step 1 of the Categ algorithm in comprehending
the metaphor ‘‘A rumor is a virus’’ and its reversed metaphor ‘‘A virus is a rumor’’
‘‘A rumor is a virus’’ ‘‘A virus is a rumor’’
(neighbors of virus) (neighbors of rumor)
contagion disclosure
fungus prosecutor
tolerance expose
disease onset report
tuberculosis conjecture
bacteria lady
antibiotic surprised
heated trouble
drug disaster public prosecutors office
blood sampling aide
blood donation resignation
administration scandal
prevention tale
take effect fact
blood transfusion business trip
blood reveal
immunity monthly
vaccine illegitimate child
chronic mass media
side-effect disavow
Note. The words are listed in descending order of cosine similarity to the vehicle.
and ‘‘A virus is a rumor’’ computed in Step 1 of the Categ algorithm. In this example, the
two sets of vehicle neighbors do not overlap. Hence, when m £ 20, the sets of k neighbors
chosen at Step 2 are inevitably different. Even in the case of m ¼ 500 and k ¼ 5, the sets of
vehicle neighbors N500(virus) and N500(rumor) only have two common words (i.e., doubt,
trigger). These two common words are chosen at Step 2 for both metaphors, but the other
three words differ between them. The words recency, spread/get about, and guess are cho-
sen for ‘‘A rumor is a virus,’’ as shown in Table 3, whereas fear, topic, and emerge/show up
are chosen for the reversed metaphor ‘‘A virus is a rumor.’’ Furthermore, when the reversed
versions of all the metaphors used in the simulation of Section 4 are computed by the Categ
algorithm with the optimal parameters, none of the reversed metaphors produce the same
set of m neighbors in Step 1 and the same set of k neighbors in Step 2 as the original meta-
phors. These results show that the Categ algorithm is basically asymmetrical from the
beginning.
On the other hand, following the same steps as the comparison process, the algorithm
Compa is initially symmetrical and asymmetrical later. The first step (Step 1) of the algo-
rithm computes k common neighbors of the topic and the vehicle, and thus it is obviously
symmetrical; the Compa algorithm computes the same set of common neighbors for the
reversed metaphors as the original metaphor. The second step (Step 2) computes the
A. Utsumi/Cognitive Science 35 (2011) 273
centroid of the topic and k common words as the metaphor vector, and the topic X of the
original metaphor ‘‘An X is a Y’’ is different from the topic Y of the reversed metaphors
‘‘A Y is an X.’’ Hence, Step 2 produces different metaphor vectors when the word order is
reversed, which means that Step 2 is asymmetrical.
Note that, in almost all cases (particularly in the case of a small m), the k common neigh-
bors computed at Step 1 of the Compa algorithm differ from the k neighbors computed at
Step 2 of the Categ algorithm. Words that are highly similar to both the vehicle and the
topic are likely to be included in both sets of k neighbors (i.e., k neighbors computed by
Compa and those computed by Categ); however, such words are quite rare, given that the
topic and the vehicle of metaphors are usually semantically dissimilar. Hence, most of the
vehicle neighbors are not neighbors of the topic.
To sum up the discussion, both algorithms can produce the asymmetry of metaphor.
Furthermore, the Categ algorithm behaves like categorization in that the computation is
asymmetrical from the beginning, whereas the Compa algorithm behaves like comparison in
that the computation is initially symmetrical and asymmetrical later. This consistency
strengthens the plausibility of the algorithms as a model of categorization and comparison.
4. Simulation experiment
This section presents the details and the results of the simulation experiment comprising
model selection and theory testing. The overall procedure of model selection and theory
testing is summarized as follows and is illustrated in Fig. 3.
Fig. 3. An illustration of model selection and theory testing procedure. The numbers in parentheses denote the
number of the section in which the corresponding procedure is explained.
274 A. Utsumi/Cognitive Science 35 (2011)
Table 8
Metaphors used in the simulation experiment
1. Life is a journey. (Jinsei ha tabi da) 2. Life is a game. (Jinsei ha ge-mu da)
3. Love is a journey. (Ai ha tabi da) 4. Love is a game. (Ai ha ge-mu da)
5. Anger is the sea. (Ikari ha umi da) 6. Anger is a storm. (Ikari ha arashi da)
7. Sleep is the sea. (Nemuri ha umi da) 8. Sleep is a storm. (Nemuri ha arashi da)
9. Perfume is a bouquet. (Ko-sui ha hanataba da) 10. Perfume is ice. (Ko-sui ha koori da)
11. A star is a bouquet. (Hoshi ha hanataba da) 12. A star is ice. (Hoshi ha koori da)
13. A sky is a mirror. (Sora ha kagami da) 14. A sky is a lake. (Sora ha mizuumi da)
15. An eye is a mirror. (Me ha kagami da) 16. An eye is a lake. (Me ha mizuumi da)
17. A lover is the sun. (Koibito ha taiyo da) 18. A lover is a rainbow. (Koibito ha niji da)
19. One’s hope is the sun. (Kibou ha taiyo da) 20. One’s hope is a rainbow. (Kibou ha niji da)
21. A child is water. (Kodomo ha mizu da) 22. A child is a jewel. (Kodomo ha houseki da)
23. Words are water. (Kotoba ha mizu da) 24. Words are jewels. (Kotoba ha houseki da)
25. An elderly person is a doll. 26. An elderly person is a deadwood.
(Roujin ha ningyou da) (Roujin ha kareki da)
27. One’s voice is a doll. (Koe ha ningyou da) 28. One’s voice is a deadwood. (Koe ha kareki da)
29. One’s character is fire. (Seikaku ha hi da) 30. One’s character is a stone. (Seikaku ha ishi da)
31. A marriage is fire. (Kekkon ha hi da) 32. A marriage is a stone. (Kekkon ha ishi da)
33. Death is the night. (Shi ha yoru da) 34. Death is the fog. (Shi ha kiri da)
35. Anxiety is the night. (Fuan ha yoru da) 36. Anxiety is the fog. (Fuan ha kiri da)
37. Time is money. (Jikan ha okane da) 38. Time is an arrow. (Jikan ha ya da)
39. Memory is money. (Kioku ha okane da) 40. Memory is an arrow. (Kioku ha ya da)
Note. The original Japanese expressions used in the experiment are shown in parentheses, preceded by their
literal English translations.
1. Forty Japanese metaphors of the form ‘‘An X is a Y’’ were used for the simulation
experiment, as listed in Table 8. The human interpretation data (i.e., a list of mean-
ings W(M) and its salience distribution p in Fig. 3) of these metaphors, their ratings of
vehicle conventionality and metaphor aptness, and their interpretive diversity values
were obtained beforehand in a previous experiment (Utsumi, 2007). Section 4.1 and
the Appendix A will explain how these data were obtained.
2. For each of the 40 metaphors, optimal parameters ^hcat and ^hcom of the two algorithms
Categ and Compa were estimated by the maximum likelihood method as follows.
(a) For given parameter values h, an algorithm (Categ or Compa) computed the
similarity distribution q(h) (i.e., qcat(hcat) or qcom(hcom) in Fig. 3) for the list of
meanings W(M).8 The method for computing the similarity distribution is
described in Section 4.2.
(b) Kullback–Leibler divergence D(p || q(h)) (henceforth, KL-divergence) between
the computed similarity distribution q(h) and the salience distribution p is com-
puted as a measure of the match between the model and data. KL-divergence
and its relation to the maximum likelihood method will be described in Sec-
tion 4.3.
(c) The optimal parameter ^ hcat or ^hcom ) is obtained by finding the parameter
h (i.e., ^
values that minimize the KL-divergence, which is described in Section 4.4.
A. Utsumi/Cognitive Science 35 (2011) 275
3. For each metaphor, the two algorithms (i.e., models) were compared using
Akaike’s information criterion (henceforth, AIC), which is a measure of statisti-
cal model selection considering the tradeoff between the model’s precision (i.e.,
the maximum log-likelihood for the model computed by the minimum KL-
divergence) and complexity (i.e., the number of free parameters of the model).
The model with a smaller AIC is selected as the best one. Hence, if the AIC of
the Categ algorithm (denoted by AICcat in Fig. 3) is smaller than the AIC of
the Compa algorithm (denoted by AICcom), the categorization model is selected
as the best one. Likewise, if AICcat is greater than AICcom, the comparison
model is selected as the best one. The details of AIC and the result of model
selection are described in Section 4.5.
4. For each metaphor and its selected model, whose optimal similarity distribution is
denoted by q^ in Fig. 3, the goodness-of-fit between the model and data is tested using
a chi-square test, owing to the well-known fact that KL-divergence can be approxi-
mated by chi-square. Metaphors that exhibit the significant discrepancy between the
model and data are excluded from the subsequent analysis. The issue of the good-
ness-of-fit test is described in Section 4.6.
5. A linear discriminant analysis is conducted with the selected model (i.e., categoriza-
tion or comparison) as the dependent variable and three metaphor properties (vehicle
conventionality, metaphor aptness, and interpretive diversity) as the independent
variables. The method and the result of the discriminant analysis are described in
Section 4.7.
This study employed 40 metaphors, as shown in Table 8. They were created from 10
groups, each of which comprised two topic words and two vehicle words. For each group,
four metaphors were created from all possible pairings of two topic words with two vehicle
words. For example, from the two topics, anger and sleep, and the two vehicles, sea and
storm, the following four metaphors were created: ‘‘Anger is the sea,’’ ‘‘Anger is a storm,’’
‘‘Sleep is the sea,’’ and ‘‘Sleep is a storm.’’ Topic and vehicle words were selected from an
experimental study on Japanese metaphor and a list of words frequently used for Japanese
metaphors so that they are highly frequent and familiar.
For human interpretation data of metaphors, this study employed the result of the psycho-
logical experiment (Experiment 2) that Utsumi (2007) conducted using the same 40 meta-
phors. This experiment addressed the difference in the comprehensibility between the
metaphor form and the simile form of a topic–vehicle pair and demonstrated that among the
three hybrid metaphor theories, the interpretive diversity view best explained the observed
comprehensibility difference. This study used some of the results obtained in this experi-
ment, namely, the listed meanings of metaphors (with the number of participants who listed
that meanings), ratings of vehicle conventionality and metaphor aptness, and interpretive
276 A. Utsumi/Cognitive Science 35 (2011)
diversity values. A detailed procedure for obtaining these results in this experiment is pro-
vided in the Appendix A.
For each metaphor M, a list W(M) of metaphorical meanings wi with the number of par-
ticipants xi who listed that meaning was provided by Utsumi’s (2007) experiment. These
meanings were used as landmarks with respect to which the computational model’s interpre-
tation and human interpretation were compared for evaluation. Note that in the experiment,
participants were instructed that they should write down three or more meanings by single
words wherever possible; as a result, the final list of meanings included only single words,
which corresponds to the unit of representation of the semantic space model. For example,
the list of meanings for the metaphor ‘‘Anger is the sea’’ includes eight features, such as
fearful/dreadful, rage/stormy, and deep, as shown in Fig. 4.
Using these data, the degree of salience pi for each meaning wi in the list W(M) isP
defined
as the ratio of the number of participants xi to the total number of tokens N ¼ nj¼1 xj ,
where n¼|W(M)|.
xi xi
pi ¼ Pn ¼ : ð1Þ
x
j¼1 j N
This definition of salience is almost identical to the definition of salience for prototype rep-
resentation of concepts by Smith, Osherson, Rips, and Keane’s (1988), and it reflects the
subjective frequency with which the feature (i.e., meaning) occurs in people’s interpreta-
tions of the metaphor. This definition is psychologically plausible because it has been
pointed out that frequency is closely related to salience (Giora, 2003; Tversky, 1977). For
example, the bar chart of Fig. 4 indicates the degree of salience of the eight features that the
pi or qi ( θ̂)
0.00 0.05 0.10 0.15 0.20 0.25 0.30
fearful/dreadful
rage/stormy
deep
big
surge
wave
Human (pi )
wide Categorization (qcat,i ( θ̂cat ) )
strong Comparison (qcom,i ( θ̂com ) )
Fig. 4. Simulation results for the metaphor ‘‘Anger is the sea.’’ The bar chart indicates the degree of salience pi
of the human interpretation. Two line graphs indicate the normalized degree of similarity qcat;i ð^hcat Þ or
qcom;i ð^hcom Þ computed by Categ or Compa algorithms. The closer a line graph is to the bar chart, the better is the
match between its corresponding model and the human data. For six of eight features (i.e., fearful/dreadful,
rage/stormy, deep, surge, wave, strong), the normalized degree of similarity qcom;i ð^hcom Þ computed by the
Compa algorithm is closer to the human data p than the degree of similarity qcat;i ð^hcat Þ by the Categ algorithm.
This result indicates that the metaphor ‘‘Anger is the sea’’ is comprehended by the comparison process.
A. Utsumi/Cognitive Science 35 (2011) 277
participants listed as the meaning of the metaphor ‘‘Anger is the sea.’’ The meaning fearful/
dreadful had the highest salience of 0.25, indicating that the number of participants who
listed it was the largest.
The interpretive diversity of each metaphor M was calculated using Shannon’s entropy
H(p), defined by the following formula (Utsumi, 2005, 2007):
X n
HðpÞ ¼ pi log pi : ð2Þ
i¼1
For example, the interpretive diversity of the metaphor ‘‘Anger is the sea’’ in Fig. 4 was
calculated as 2.71, given that the bar length for a feature wi corresponds to pi. The mean
interpretive diversity across the 40 metaphors used in the experiment was 3.01 (SD ¼ 0.42),
ranging from 2.09 (‘‘One’s character is a stone,’’ numbered 30 in Table 8) to 3.76 (‘‘An
eye is a mirror,’’ numbered 15).
For vehicle conventionality and metaphor aptness, this study used the mean ratings for
each metaphor obtained in the previous experiment (Utsumi, 2007). The metaphors were
rated on a 7-point scale ranging from 1 (very novel) to 7 (very conventional) for convention-
ality, or from 1 (not at all apt) to 7 (extremely apt) for aptness. The details of the rating
experiment are provided in the Appendix A. The mean conventionality rating across the 40
metaphors was 4.46 (SD ¼ 1.19), ranging from 1.83 (‘‘Memory is an arrow,’’ numbered 40
in Table 8) to 6.28 (‘‘A lover is the sun,’’ numbered 17). The mean aptness rating was 3.70
(SD ¼ 1.07), ranging from 1.83 (‘‘One’s voice is a doll,’’ numbered 27) to 6.00 (‘‘Life is a
journey,’’ numbered 1).
3. Finally, the similarity distribution q(h) (i.e., qcat(hcat) or qcom(hcom)) was calculated by
the following formulas:
di ðM; hÞ
qi ðhÞ ¼ Pn ; ð3Þ
j¼1 dj ðM; hÞ
In Eq. 4, X denotes the set of all words in the semantic space, and thus, minx2X
{cos (v(x),v(M;h))} denotes the smallest (minimum) cosine value between the meta-
phor vector v(M;h) and any word vector v(x) in the semantic space. Therefore,
di(M;h) expresses the deviation of wi’s cosine similarity from the minimum cosine.
Equation 3 shows that the normalized degree of similarity qi(h) for the feature wi is
defined as the ratio of the derivation of cosine, so that it is analogous to the degree
of salience p defined in Eq. 1. The reason for using the deviation of cosine similarity
instead of the cosine similarity itself is that cosine takes a negative value, and thus,
the absolute cosine value does not necessarily reflect the degree of similarity in that,
for example, a zero cosine does not imply that there is no similarity.
For instance, the two line graphs in Fig. 4 illustrate the similarity distribution computed
using two metaphor vectors for the metaphor ‘‘Anger is the sea.’’ The solid line with filled
circles depicts the similarity distribution qcat(hcat) computed by the categorization algorithm
Categ with hcat ¼ (m,k) ¼ (10,7), and the dotted line with filled triangles depicts the simi-
larity distribution qcom(hcom) computed by the comparison algorithm Compa with hcom ¼
(k) ¼ (1). This figure shows, for example, that the normalized degree of similarity qi(h)
for the feature fearful/dreadful is qi,cat(hcat) ¼ 0.157 for the Categ algorithm and
qi,com(hcom) ¼ 0.247 for the Compa algorithm.
As mentioned in the beginning of this section, the match between the model and data can
be assessed as the degree of similarity between the computed distribution q(h) and the
human salience distribution p. The greater the similarity between the two distributions, the
better is the algorithm’s simulation of human interpretation.
To quantitatively evaluate similarity or dissimilarity between the two distributions, this
study used KL-divergence, which is also known as relative entropy. KL-divergence is the
most popular measure of dissimilarity between two probability distributions and has been
applied in computational semantics as a semantic similarity measure (e.g., Hu et al., 2006;
Manning & Schütze, 1999). The KL-divergence D(p || q(h)) of the computed similarity dis-
tribution q(h) relative to human salience distribution p is given by the following formula:
Xn
pi
Dðp jj qðhÞÞ ¼ pi ln : ð5Þ
i¼1
qi ðhÞ
As it measures how badly the model’s distribution q(h) approximates observed distribution
p, lower divergence implies better performance.
A. Utsumi/Cognitive Science 35 (2011) 279
X
n X
n
DðpjjqðhÞÞ ¼ pi ln pi pi ln qi ðhÞ
i¼1 i¼1
1 X
n
¼ HðpÞ xi ln qi ðhÞ ð6Þ
N i¼1
1
¼ HðpÞ ln LðhÞ:
N
The first term )H(p) on the right-hand side of Eq. 6 is the (negative) interpretive
diversity defined by Eq. 2 and independent
P of h, and the second term is the (negative)
log-likelihood function ln LðhÞ ¼ ni¼1 xi ln qi ðhÞ (divided by N) for h under the
distribution q(h). Therefore, minimizing D(p || q(h)) is equivalent to maximizing the
log-likelihood function.
For each metaphor, optimal parameters ^ hcat and ^hcom of the two algorithms (i.e., categori-
zation and comparison models) were computed by finding the parameter values that mini-
mize the KL-divergence, that is, maximize the likelihood function. The parameter space
was given such that the parameter m varied between 10 and 45 in steps of 5 and between 50
and 500 in steps of 50, and the parameter k varied between 1 and 10.
In the case of the metaphor ‘‘Anger is the sea,’’ for example, the parameter hcat ¼
(m,k) ¼ (10,7) minimized the KL-divergence of the categorization model (i.e., Categ algo-
rithm), and the parameter hcom ¼ (k) ¼ (1) minimized the KL-divergence of the compari-
son model (i.e., Compa algorithm). (Two line graphs in Fig. 4 show the similarity
distributions at these optimal parameters.) The minimum KL-divergence was 0.147 for the
categorization model and 0.0854 for the comparison model. These KL-divergence values
indicate that the similarity distribution computed by the comparison model (i.e., the dotted
line with filled triangles) is more similar to the human salience distribution (i.e., bar chart)
than the similarity distribution computed by the categorization model (i.e., the solid line
with filled circles).
To compare the two models while considering the tradeoff between the model’s precision
and complexity, this study used AIC, which has been widely used as a tool for statistical
model selection (e.g., Wagenmakers & Farrell, 2004). In general, AIC is given by:
AIC ¼ 2 ln Lð^
hÞ þ 2K; ð7Þ
280 A. Utsumi/Cognitive Science 35 (2011)
where Lð^ hÞ is the maximum value of the likelihood function for the model, and K is the
number of parameters in the model. Smaller AIC values represent more plausible models.
Hence, the model with the smallest AIC can be selected as the best one.
In this study, the AIC value can be calculated by:
Xn
AIC ¼ 2N pi ln qi ð^
hÞ þ 2K; ð8Þ
i¼1
where K ¼ 2 for the categorization model (because the algorithm Categ has two parameters
m and k), and K ¼ 1 for the comparison model (because the algorithm Compa has only one
parameter k). For each metaphor, DAIC was calculated as the difference between the AIC
value of the categorization model (AICcat) and the AIC value of the comparison model
(AICcom).
If DAIC > 0 (i.e., AICcom is less than AICcat), the comparison model (Compa) was selected
as the one that best approximated the underlying comprehension process; conversely, if
DAIC < 0 (i.e., AICcat is less than AICcom), then the categorization model (Categ) was
selected.
The result of the model selection for all 40 metaphors was that the categorization model
was selected for 11 metaphors (the metaphors numbered 3, 17, 18, 25, 28, 31, 32, 33, 37,
38, and 39 in Table 8), and the comparison model was selected for 29 metaphors. The mean
DAIC for the 11 metaphors judged to be comprehended as categorizations was )2.71
(SD ¼ 2.64), and that for the 29 metaphors judged to be comprehended as comparisons was
1.87 (SD ¼ 1.38). For example, in the case of ‘‘Anger is the sea’’ in Fig. 4, the AIC value
of the categorization model was 165.98, and that of the comparison model was 159.07.
Because DAIC ¼ 6.91 was positive, the comparison model was selected as the best model,
suggesting that the metaphor ‘‘Anger is the sea’’ is likely to be comprehended as a compari-
son, rather than as a categorization. Indeed, Fig. 4 shows that for six of eight features, the
normalized degree of similarity qcom;i ð^
hcom Þ computed by the comparison model is closer to
the human data than the degree of similarity qcat;i ð^hcat Þ by the categorization model. Further-
more, the comparison model correctly distinguishes the three most salient features (i.e.,
fearful, rage, and deep) from other less salient features by computing the degree of similar-
ity, whereas the categorization model does not.
For each metaphor and its selected model, the chi-square goodness-of-fit test was
conducted to examine whether the match between the model and data was significant.
Metaphors that displayed a significant discrepancy between the model and data would
A. Utsumi/Cognitive Science 35 (2011) 281
be excluded from the subsequent analysis. (Note that the goodness-of-fit test was not
applied to the model that was not selected by the AIC model selection procedure
because the goodness-of-fit for the model not selected has no influence on the
subsequent analysis.)
It is well known that the KL-divergence can be approximated by chi-square (divided by
2N) because it is identical up to the third order (Cover & Thomas, 2006).
1X n
ðpi q^i Þ2
Dðp jj q^Þ ¼ þ Oðjjpi q^i jj3 Þ
2 i¼1 q^i
ð10Þ
1 X n
Nðpi q^i Þ2 1 2
¼ v :
2N i¼1 q^i 2N
Hence, the discrepancy between the model distribution and the observed human distribution
is significant (i.e., the null hypothesis that the data distribution follows the model distribu-
tion is rejected) if the KL-divergence multiplied by 2N exceeds the critical value of the
chi-square distribution with n ) 1 degrees of freedom.
qÞ v21a ðn 1Þ:
2N Dðpjj^ ð11Þ
The result of the goodness-of-fit test for all the metaphors was that none of the fits of the
selected model to the data were rejected at a significance level of a ¼ 0.05; thus, all the 40
metaphors would be included in the subsequent analysis. For example, in the case of the
metaphor ‘‘Anger is the sea’’ in Fig. 4, the selected model (i.e., comparison) was accepted
as a good fit to the human data; the minimum KL-divergence (¼0.0854) multiplied by
2N(¼2 · 40) equals 6.832, which did not exceed the critical value 14.07 (d.f. ¼ 7).
A linear discriminant analysis was conducted to reveal the metaphor properties that deter-
mine the choice of comprehension process. The dependent variable was whether the
selected model (i.e., the comprehension process) is categorization or comparison. The inde-
pendent variables comprised three metaphor properties, namely, vehicle conventionality,
metaphor aptness, and interpretive diversity, whose correlations were r ¼ .36 between con-
ventionality and aptness, r ¼ ).30 between conventionality and diversity, and r ¼ ).14
between aptness and diversity.
Table 9 shows the result of the discriminant analysis based on all the 40 metaphors. The
analysis yielded a significant discrimination function, Wilk’s lambda ¼ 0.70, F(3,36) ¼
5.08, p < .005. The function correctly classified 32 of 40 metaphors (80.0%), and the
kappa coefficient of agreement j ¼ 0.55 was significant, Z ¼ 3.10, p ¼ .002. The meta-
phors numbered 15, 16, 23, 24, 29, 35, 37, and 39 in Table 8 were not correctly classified.
The left table in Table 10 shows the classification table of the analysis. For the class of
282 A. Utsumi/Cognitive Science 35 (2011)
Table 9
Result of the non-cross-validated discriminant analysis for predicting the choice
between categorization and comparison models
Standardized coefficients
Vehicle conventionality 1.37 (p ¼ .0062)
Metaphor aptness )0.28 (p ¼ .54)
Interpretive diversity 1.47 (p ¼ .0022)
Wilk’s lambda 0.70
R2 0.30
Accuracy (correctly predicted) 0.80
Cohen’s kappa 0.55
Table 10
Classification tables of non-cross-validated and cross-validated discriminant analyses
Predicted Categories
Non-cross-validated Classification Cross-validated Classification
Actual Cat Com Total Cat Com Total
Cat 9 2 11 9 2 11
Com 6 23 29 7 22 29
Total 15 25 40 16 24 40
Note. Cat, categorization; Com, comparison.
categorization, recall was 0.82 (¼9/11) and precision was 0.60 (¼9/15). For the class of
comparison, recall was 0.79 (¼23/29) and precision was 0.92 (¼23/25). Therefore, the bal-
anced F-score (i.e., the harmonic mean of recall and precision) was 0.69 for the class of cat-
egorization and 0.85 for that of comparison.
Concerning the standardized discriminant coefficient for the three metaphor properties,
Table 9 demonstrates that interpretive diversity had the highest discriminant coefficient and
was significantly associated with the choice of comprehension process, F(1,36) ¼ 10.89,
p < .005. This result is consistent with the interpretive diversity view, which argues that
high-diversity metaphors are processed as categorizations and low-diversity metaphors are
processed as comparisons. Vehicle conventionality had the second-highest coefficient and
also reached statistical significance, F(1,36) ¼ 8.47, p < .01. This result is consistent with
the conventionality view, which predicts that conventional metaphors are processed as cate-
gorizations, whereas novel metaphors are processed as comparisons. On the other hand,
metaphor aptness did not affect the choice of comprehension process; its standardized coef-
ficient )0.28 was not significant. This result is not consistent with the aptness view, suggest-
ing that the choice of comprehension strategy may not depend on metaphor aptness. Hence,
the result of the discriminant analysis indicates that both the interpretive diversity view and
the conventionality view are plausible theories of metaphor comprehension.
This result was replicated by the cross-validated discriminant analysis, suggesting
that the finding on the importance of interpretive diversity and conventionality may be
A. Utsumi/Cognitive Science 35 (2011) 283
independent of the training data. The leave-one-out cross-validation method was used for
the cross-validated analysis, in which each metaphor was classified using a discriminant
function derived from the remaining 39 metaphors. The classification table for the cross-val-
idated analysis (the right table of Table 10) shows that 31 metaphors (77.5%) were classi-
fied correctly and the kappa coefficient of agreement j ¼ 0.51 was significant, Z ¼ 2.92,
p < .005. All the eight misclassified metaphors in the non-cross-validated analysis were also
misclassified in the cross-validated analysis, and additionally, the metaphor numbered 2 was
misclassified in the cross-validated analysis.
To compare the predictive ability between interpretive diversity and vehicle convention-
ality, I conducted a commonality analysis. Commonality analysis is a method of variation
partitioning by which one can calculate the proportions of variance in the dependent vari-
able associated uniquely with each of the independent variables (i.e., unique contributions
of independent variables to the prediction of the discriminant analysis), as well as the pro-
portions of variance attributed to various combinations of independent variables (i.e., com-
mon contributions of the combinations of variables). Table 11 shows the result of the
commonality analysis. Interpretive diversity made a larger unique contribution (0.212) to
predicting model selection than vehicle conventionality (0.165); this suggested that interpre-
tive diversity may be a more important factor in explaining the choice of comprehension
process. The negative common contribution ()0.080) of interpretive diversity and vehicle
conventionality indicates that they have no joint effect, or more concretely, that the two
variables are competitive in the sense that one variable hinders the contribution of the other
(Legendre & Legendre, 1998). In addition, I conducted two separate discriminant analyses,
one considered only interpretive diversity as the independent variable, whereas the other
considered only conventionality. The discriminant analysis with interpretive diversity
yielded a significant discrimination function, Wilk’s lambda ¼ 0.87, F(1,38) ¼ 5.64,
p < .05, which correctly classified 29 (72.5%) metaphors. The kappa coefficient j ¼ 0.42
showed moderate agreement and was significant, Z ¼ 2.56, p ¼ .01. On the other hand, the
discriminant analysis with conventionality did not yield a significant discrimination func-
tion, Wilk’s lambda ¼ 0.92, F(1,38) ¼ 3.08, p ¼ .09. The function correctly classified only
47.5% of the metaphors and the kappa coefficient was negative, which indicated that there
was no agreement between the prediction and the simulation experiment. These findings
suggest that interpretive diversity may be a better predictor of the metaphor comprehension
process.
Table 11
Unique and common contributions of three metaphor properties in accounting for the variance in the
choice of the metaphor comprehension process
Unique Contributions Common Contributions
ID VC AP ID and VC ID and AP VC and AP ID, VC, and AP Sum
0.212 0.165 0.007 )0.080 0.002 )0.004 )0.006 0.298
Note. ID, interpretive diversity; VC, vehicle conventionality; AP, aptness.
284 A. Utsumi/Cognitive Science 35 (2011)
Furthermore, to justify the finding of the simulation, I examined the effect of the image-
ability of metaphors (Marschark, Katz, & Paivio, 1983), that is, the ease with which the met-
aphorical sentence evokes mental imagery. Various metaphor studies (e.g., Marschark &
Hunt, 1985; Marschark et al., 1983; Paivio & Walsh, 1993) have addressed the role of men-
tal imagery in metaphor comprehension since the very beginning of metaphor research. If
metaphor imageability accounts for a significant portion of the variance in the discriminant
analysis, this weakens the validity of the finding that both the interpretive diversity view
and the conventionality view are plausible. The imageability of the 40 metaphors was rated
by 21 participants on a 7-point scale ranging from 1 (difficult to evoke mental imagery) to
7 (easy to evoke), and the mean imageability rating for each metaphor was used as the inde-
pendent variable of the discriminant analysis. Furthermore, lexical properties (i.e., vehicle
frequency, topic frequency, vehicle concreteness, topic concreteness, vehicle familiarity,
and topic familiarity) were also used as the independent variables. Word concreteness was
obtained by the rating study in which 11 participants rated 40 words used in the 40 meta-
phors on a 7-point scale of concreteness (1: abstract, 7: concrete). On the other hand, word
frequency and familiarity values were derived from the database of Japanese lexical proper-
ties ‘‘Nihongo No Goi Tokusei.’’ The result of the (non-cross-validated) discriminant analy-
sis was that none of these seven properties were significantly associated with the dependent
variable. This result indicates that the explanatory power of interpretive diversity and con-
ventionality is not attributable to these factors.
In sum, these results support the conclusion that both the interpretive diversity view and
the conventionality view are plausible theories of metaphor comprehension. Additionally,
interpretive diversity emerged as a better predictor of the metaphor comprehension process
in this study, mimicking the experimental results. However, simulations with other meta-
phor data may yield different conclusions.
5. General discussion
The simulation experiment reported in this article demonstrated that the interpretive
diversity view and the conventionality view are plausible, but it did not provide evidence
supporting the aptness view. This computational finding is consistent with the empirical
finding obtained by Utsumi (2007); in his psychological experiment (Experiment 1), both
interpretive diversity and vehicle conventionality were found to be significant predictors of
the choice of comprehension process. Therefore, I have obtained theoretical and experimen-
tal convergence on the conclusion that the interpretive diversity view and the conventional-
ity view are plausible theories of metaphor comprehension. The observed consistency
between empirical and theoretical findings also indicates that the computational methodol-
ogy of this study is potentially useful for providing a new insight into the cognitive pro-
cesses in metaphor comprehension and possibly language comprehension in general. If the
cognitive processes that are being explored can be appropriately modeled in the semantic-
A. Utsumi/Cognitive Science 35 (2011) 285
space-based framework, the maximum likelihood method can determine which processes
are plausible.
Why does interpretive diversity (or semantic richness) affect metaphor comprehension?
Utsumi (2007) has provided one possible answer in terms of the nature of categorization.
When people interpret that an entity X is a member of, or classified into, a category Y, entity
X is expected to share many salient features with category Y because the members of a cate-
gory inherit many features of the category by default. In other words, a semantically rich
entity is easy to categorize. Hence, the categorization process proceeds more easily when
more features of category Y can be attributed to X, that is, when a pairing of X and Y is
more diverse. As a result, interpretively diverse metaphors are comprehended via a categori-
zation process, whereas less diverse metaphors fail to be processed as categorizations, and
thus, they must be reinterpreted via a comparison process.
Empirical evidence for the effects of semantic richness or diversity has been established
by a number of studies on language comprehension. Rodd, Gaskell, and Marslen-Wilson
(2002) demonstrated that semantically rich words with many related senses facilitated word
recognition. Similarly, Pexman, Lupker, and Hino (2002) found a number-of-features effect,
that is, faster lexical decision responses, for words with many semantic features than words
with fewer semantic features. Pexman, Holyk, and Monfils (2003) demonstrated that the
number-of-features effect was also observed in the semantic categorization task; semanti-
cally richer words were more quickly judged as a member of a given category, and such an
effect was greater when a given category was broader, in other words, semantically richer.
Furthermore, Adelman et al. (Adelman & Brown, 2008; Adelman, Brown, & Quesada,
2006) have recently demonstrated that contextual diversity—the number of contexts in
which a word appears—affects lexical decision. As semantically richer words will be used
in more variable contexts, contextual diversity can also be considered as a measure of
semantic richness.
semantic space models such as LSA cannot simulate language comprehension in general
(Glenberg & Robertson, 2000; Zwaan & Yaxley, 2003) and metaphor comprehension in
particular because semantic space models are not embodied. If this is true, then this
study cannot provide any evidence concerning the cognitive mechanism of metaphor
comprehension.
Against the criticism of the inability of semantic space models made by the embodied
theory, this article responds in two ways. One way of defending the position that metaphor
comprehension can be computationally simulated by the semantic space models is to dem-
onstrate the ability of semantic space models in explaining linguistic phenomena that
according to the embodied theory cannot be explained by semantic space models. Although
it is still controversial whether semantic space models can represent knowledge based on
embodied experiences and whether they can explain embodied comprehension (de Vega,
Glenberg, & Graesser, 2008), many recent studies have demonstrated that semantic space
models such as LSA (or co-occurrence statistics) are capable of doing so (e.g., Kintsch,
2007, 2008b; Louwerse, 2007, 2008; Louwerse & Van Peer, 2009). For example, Louwerse
(2007) demonstrated that LSA can successfully distinguish non-afforded sentences (e.g.,
‘‘He used his glasses to dry his feet.’’) from afforded sentences (e.g., ‘‘He used his shirt to
dry his feet.’’) and related sentences (e.g., ‘‘He used his towel to dry his feet.’’); this is in
contrast to Glenberg and Robertson’s (2000) claim that LSA cannot capture such embodied
distinction. Furthermore, the assumption of the embodied metaphor theory that metaphorical
expressions are inevitably linguistic realizations of conceptual metaphors would imply that
linguistic co-occurrence can capture conceptual metaphors. For example, it is highly likely
that Happy Is Up encourages words expressing affective states and words expressing vertical
positions to co-occur in text. It follows that LSA can explain embodied metaphors. Indeed,
Mason (2004) revealed that many conceptual metaphors can be extracted automatically
from a large corpus.
Another way of defending the position that the semantic space models can simulate met-
aphor comprehension is to demonstrate that the role of embodiment in metaphor compre-
hension is more limited than expected by the embodied theory of metaphor. As mentioned
in Section 2.1.1, although there is little doubt that primary metaphors are embodied, it is
highly unclear whether any complex metaphors are embodied or whether they are necessary
for metaphor comprehension. Concerning the need for conceptual metaphors, some negative
findings have established that people do not necessarily comprehend metaphors depending
on conceptual metaphors (Glucksberg & McGlone, 1999; Keysar, Shen, Glucksberg, &
Horton, 2000; Murphy, 1996). Surprisingly, even Barsalou (1999), who adopts an embodied
view of cognition, pointed out that abstract concepts such as anger are directly grounded in
perceptual experience, without being mediated by conceptual metaphors. He also suggested
that conceptual metaphors were not required in this case; familiar or conventional meta-
phors may bypass conceptual metaphors.
From these discussions, it can be concluded that metaphor comprehension can be ade-
quately simulated by the computational models presented in this article. It can be asserted
that the criticism made by the embodied theories does not apply to the framework of this
study.
A. Utsumi/Cognitive Science 35 (2011) 287
Over the past few decades, a number of computational studies on metaphor comprehen-
sion have been conducted. Computational studies from the 1990s include computational dis-
crimination among metaphor, metonymy, anomaly, and literalness using lexical semantics
(Fass, 1991); comprehension of predicative metaphors using knowledge about conceptual
metaphors (Martin, 1992); and connectionist implementation of nominal metaphor compre-
hension (Thomas & Mareschal, 2001) and adjective metaphors (Weber, 1991).
This study essentially differs from these computational studies in that they did not test
the validity of their computational models in a systematic way; they provided only a small
number of examples, whose plausibility was judged on the basis of the researcher’s insight.
The reason behind this drawback was that the lexical, semantic, or metaphorical knowledge
used in these studies had to be manually coded by the researchers, and therefore, was small
in size.
In recent years, however, very large corpora have become easily available and corpus-
based computational studies on metaphor have been conducted. One corpus-based approach
to metaphor is to automatically build a large knowledge base on conceptual metaphor,
which is used for comprehending predicative metaphors (Martin, 1994; Mason, 2004),
particularly for the technical purpose of dealing with metaphors in an NLP system.
A more important and promising corpus-based approach is to develop a computa-
tional model of metaphor comprehension using a semantic space model constructed
from the statistical analysis of a huge corpus. A pioneering work that follows this
approach is Kintsch’s (2000, 2008a) computational model of metaphor comprehension
based on LSA. Kintsch applied his predication algorithm to metaphor comprehension
and demonstrated that the model can not only compute intuitively reasonable interpreta-
tions of metaphors but also account for some of the phenomena observed in metaphor
comprehension experiments, such as the nonreversibility of metaphors. However, he did
not test the model’s psychological plausibility either in a direct or systematic fashion;
in other words, he did not clarify how well the computed interpretation fits with human
data for metaphor interpretation. Lemaire and Bianco (2003) also employed LSA to
develop a computational model of referential metaphors for simulating the processing
time difference between a metaphorical reference and a literal reference. They modeled
the processing time of referential expressions as the depth of the search for those
neighbors of a referential expression that are also related to a given context. Using this
model, they showed that the simulation result was consistent with empirical data on the
processing time difference between a metaphorical reference and a literal reference
according to a different (literally supportive or metaphorically supportive) context.
However, their model has some limitations: It cannot compute the meaning of referen-
tial metaphors (i.e., the referent of metaphorical references). Thus, Lemaire and Bianco
have not addressed how well their model mimics human interpretations.
In contrast, the LSA-based approach to metaphor presented in this article differs from
these studies in two ways. First, this study employs a quantitative measure of the fit between
the model (i.e., computer interpretations of metaphors) and data (i.e., human interpretations
288 A. Utsumi/Cognitive Science 35 (2011)
of the same metaphors) to evaluate the degree to which the computational model imitates
human behavior concerning metaphor comprehension. Second, this study uses a computa-
tional methodology to provide an original contribution to the understanding of the cognitive
mechanisms of metaphor comprehension, rather than to simply retest or confirm the empiri-
cal findings. In other words, this study determines which of the metaphor views are more
plausible, by identifying the view that can best explain the result of the simulation in which
human behavior can be simulated by the model that embodies metaphor comprehension pro-
cess. In contrast, other LSA-based studies only showed whether human behavior could be
simulated by the model that may not embody existing metaphor views. As mentioned in
Section 5.1, the observed consistency between the existing empirical findings and the com-
putational finding of this study provides some support for the usefulness of the computa-
tional methodology of this study for metaphor research.
The semantic-space-based methodology presented in this article has its limitations; these
limitations reveal the aspects of the simulation results that do not hold for other metaphor
views apart from those tested directly in the simulation experiment.
One important limitation is that the finding obtained in this study does not address the
subtle but crucial differences among various views on the comparison process described in
Section 2.1.1. A crucial perspective according to which these views are differentiated is the
kind of similarities that are preferentially included in the common structure obtained during
the comparison process. For example, Gentner’s structure mapping theory argues that the
comparison process primarily focuses on the relational similarities, whereas Holyoak’s
ACME and LISA argue that semantic and pragmatic similarities are required in the compar-
ison process. At present, the semantic-space-based methodology is less likely to provide an
appropriate technique to compare the plausibility of these views; Ramscar and Yarlett
(2003) suggested that a semantic space model such as LSA does not have sufficient model-
ing ability for analogical mapping, although it simulates appropriate patterns of analogical
retrieval. However, I am somewhat optimistic about this issue in that Kintsch (2008a) and
Mangalath, Quesada, and Kintsch (2004) show a possibility of LSA-based modeling of ana-
logical mapping.
Another limitation of the semantic-space-based methodology concerns the time-course of
metaphor comprehension. The semantic space framework is not suitable for simulating the
temporal behavior of the comprehension process, because it does not provide a method for
representing time. Although some product measures such as comprehension speed can be
simulated in this framework (e.g., Lemaire & Bianco, 2003), a fine-grained analysis of the
time-course using eye movement or functional brain mapping cannot be simulated. (Note
that it does not mean that the semantic space model cannot model cognitive processes. The
extent of time-course details specified is different between the semantic space model and
other models, such as the connectionist model [e.g., recurrent networks], that are much more
adequate for representing time.) This limitation may be serious for metaphor research, given
that a considerable number of metaphor studies (e.g., Gibbs, 1994; Giora, 2003) have been
A. Utsumi/Cognitive Science 35 (2011) 289
6. Conclusion
A semantic space model, such as LSA, can provide an effective technique to simu-
late metaphor comprehension processes, such as categorization and comparison. Using
a semantic space model, this study has attempted to determine which of the existing
metaphor views, namely, the conventionality view (Bowdle & Gentner, 2005), aptness
view (Glucksberg & Haught, 2006b), or interpretive diversity view (Utsumi, 2007), is
most plausible. The simulation experiment, which comprised model selection and the-
ory testing, has shown that the interpretive diversity and conventionality views signi-
ficantly account for which of the categorization and comparison models fits better
with the empirical data; this finding indicates that both views are plausible. These
results are consistent with Utsumi’s (2007) empirical findings, and thus, they
strengthen the validity of the interpretive diversity and conventionality views. At the
same time, these results indicate the potential of the semantic-space-based computational
methodology for the cognitive study of language comprehension.
290 A. Utsumi/Cognitive Science 35 (2011)
Notes
1. Bowdle and Gentner (2005) refer to this view as the ‘‘career of metaphor’’ hypothesis.
This term emphasizes an evolutionary aspect of metaphor comprehension. When a
metaphor is first used (i.e., it is novel), it is comprehended strictly as comparison.
However, if this metaphor is repeatedly used to convey the same meaning, then this
repeated mapping process gives rise to the creation of an abstract category that
becomes associated with the vehicle. They refer to the process through which a vehi-
cle term becomes associated with a metaphoric category as conventionalization.
Hence, conventionalization results in an evolutionary shift from comparison to catego-
rization. Note that in this article I do not use the term ‘‘career of metaphor’’ to refer to
their view, because this study is not directly concerned with an evolutionary aspect of
metaphor comprehension.
2. Content words are words that primarily express lexical meanings; therefore, they can
be represented as vectors in a semantic space. On the other hand, function words, such
as articles, auxiliary verbs, and pronouns, which primarily express grammatical rela-
tionships, are not represented because they have little lexical meaning. Grammatical
functions should not be attributed to the vector representation; they should be consid-
ered in the method for generating a vector representation of a sentence.
3. Formally, given that tfij is the frequency of the ith word wi in the jth document (e.g.,
paragraph) and R is the number of documents, the jth element wij of the word vector
for the word wi is computed by the following formulas:
PR !
k¼1 Pik log Pik
wij ¼ tfij 1 þ ;
log R
tfij
Pij ¼ PR :
k¼1 tfik
5. Table 3 (and other tables) lists some phrases comprising multiple words (e.g., ‘‘dis-
ease onset,’’ ‘‘drug disaster,’’ ‘‘blood sampling’’), which appear inconsistent with the
assumption that the semantic space model can only represent vectors for single words.
However, the Japanese translations of these phrases are single words (e.g.,
‘‘hatsubyo,’’ ‘‘yakugai,’’ ‘‘saiketsu’’), and thus, an inconsistency does not actually
occur.
6. These values are computed by using the semantic space employed in the simulation
experiment, which will be presented in Section 4. The Categ algorithm computed a
sentence vector in Fig. 1 (and also in Fig. 2) with m ¼ 20 and k ¼ 3, whereas the
Compa algorithm computed a sentence vector with k ¼ 3. Kintsch (2001) suggests
that these parameter values work effectively for literal sentences.
7. Using some examples, Kintsch (2001) also showed that the predication algorithm
works well for a model of categorization. For example, he demonstrated that the
vector for ‘‘A pelican is a bird’’ computed by the predication algorithm became
more similar to the features related to bird (e.g., sing beautifully) and less similar
to the features irrelevant to bird (e.g., eat fish and sea) than the original vector of
pelican.
8. In this article, I use a generic notation without a subscript indicating the algorithm
(e.g., h instead of hcat and hcom, and q instead of qcat and qcom), if the description is
applicable to both algorithms or models.
9. I thank one of the reviewers for suggesting the possibility of a race between
categorization and comparison.
Acknowledgments
This study was supported by a Grant-in-Aid for Scientific Research C (No. 17500171 and
No. 20500234), The Ministry of Education, Culture, Sports, Science and Technology. I
thank the associate editor Danielle S. McNamara and four anonymous reviewers for their
insightful comments and suggestions, which helped me improve the article.
References
Adelman, J., & Brown, G. (2008). Modeling lexical decision: The form of frequency and diversity effects.
Psychological Review, 115, 214–229.
Adelman, J., Brown, G., & Quesada, J. (2006). Contextual diversity, not word frequency, determines word-
naming and lexical decision times. Psychological Science, 17, 814–823.
Barsalou, L. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 22, 577–660.
Barsalou, L. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645.
Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer.
Blasko, D., & Connine, C. (1993). Effects of familiarity and aptness on metaphor understanding. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 19(2), 295–308.
Bowdle, B., & Gentner, D. (2005). The career of metaphor. Psychological Review, 112, 193–216.
292 A. Utsumi/Cognitive Science 35 (2011)
Cameron, L. (2008). Metaphor and talk. In R. Gibbs (Ed.), The Cambridge handbook of metaphor and thought
(pp. 197–211). New York: Cambridge University Press.
Carston, R. (2002). Thoughts and utterances: The pragmatics of explicit communication. Oxford, England:
Blackwell.
Chiappe, D., & Kennedy, J. (1999). Aptness predicts preference for metaphors or similes, as well as recall bias.
Psychonomic Bulletin & Review, 6, 668–676.
Chiappe, D., Kennedy, J., & Smykowski, T. (2003). Reversibility, aptness, and the conventionality of metaphors
and similes. Metaphor and Symbol, 18, 85–105.
Clausner, T., & Croft, W. (1997). Productivity and schematicity in metaphors. Cognitive Science, 21,
247–282.
Cover, T., & Thomas, J. (2006). Elements of information theory (2nd ed.). Hoboken, NJ: John Wiley & Sons.
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic
analysis. Journal of the American Society for Information Science, 41, 391–407.
Fass, D. (1991). Met*: A method for discriminating metonymy and metaphor by computer. Computational
Linguistics, 17, 49–90.
Foltz, P., Kintsch, W., & Landauer, T. (1998). The measurement of textual coherence with latent semantic
analysis. Discourse Processes, 25, 285–307.
Gentner, D. (1983). Structure mapping: A theoretical framework for analogy. Cognitive Science, 7, 155–170.
Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity and
analogical reasoning (pp. 199–241). Cambridge, England: Cambridge University Press.
Gentner, D., & Bowdle, B. (2008). Metaphor as structure mapping. In R. Gibbs (Ed.), The Cambridge handbook
of metaphor and thought (pp. 109–128). New York: Cambridge University Press.
Gentner, D., & Markman, A. (1997). Structure mapping in analogy and similarity. American Psychologist, 52,
45–56.
Gentner, D., & Wolff, P. (1997). Alignment in the processing of metaphor. Journal of Memory and Language,
37, 331–355.
Gentner, D., Bowdle, B., Wolff, P., & Boronat, C. (2001). Metaphor is like analogy. In D. Gentner, K. Holyoak,
& B. Kokinov (Eds.), Analogical mind: Perspectives from cognitive science (pp. 199–253). Cambridge, MA:
MIT Press.
Gibbs, R. (1994). The poetics of mind. Cambridge, England: Cambridge University Press.
Gibbs, R. (2006). Embodiment and cognitive science. New York: Cambridge University Press.
Gibbs, R. W. (Ed.). (2008). The Cambridge handbook of metaphor and thought. New York: Cambridge
University Press.
Giora, R. (2003). On our mind: Salience, context, and figurative language. New York: Oxford University Press.
Glenberg, A., & Robertson, D. (2000). Symbol grounding and meaning: A comparison of high-dimensional and
embodied theories of meaning. Journal of Memory and Language, 43, 379–401.
Glucksberg, S. (2001). Understanding figurative language: From metaphors to idioms. New York: Oxford
University Press.
Glucksberg, S. (2003). The psycholinguistics of metaphor. Trends in Cognitive Sciences, 7, 92–96.
Glucksberg, S., & Haught, C. (2006a). Can Florida become like the next Florida? When metaphoric comparisons
fail. Psychological Science, 17, 935–938.
Glucksberg, S., & Haught, C. (2006b). On the relation between metaphor and simile: When comparison fails.
Mind & Language, 21, 360–378.
Glucksberg, S., & Keysar, B. (1990). Understanding metaphorical comparisons: Beyond similarity. Psycho-
logical Review, 97, 3–18.
Glucksberg, S., & McGlone, M. (1999). When love is not a journey: What metaphors mean. Journal of Pragmat-
ics, 31, 1541–1558.
Glucksberg, S., McGlone, M., & Manfredi, D. (1997). Property attribution in metaphor comprehension. Journal
of Memory and Language, 36, 50–67.
Grady, J. (1997). Theories are buildings revisited. Cognitive Linguistics, 8, 267–290.
A. Utsumi/Cognitive Science 35 (2011) 293
Grady, J. (2005). Primary metaphors as inputs to conceptual integration. Journal of Pragmatics, 37, 1595–1614.
Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction. Cognitive Science, 13,
295–355.
Hu, B., Kalfoglou, Y., Alani, H., Dupplaw, D., Lewis, P., & Shadbolt, N. (2006). Semantic metrics. In S. Staab
& V. Svatek (Eds.), Proceedings of the 15th international conference on knowledge engineering and knowl-
edge management (EKAW 2006) (pp. 166–181). Berlin: Springer.
Huettig, F., Quinlan, P. T., McDonald, S. A., & Altmann, G. T. (2006). Models of high-dimensional
semantic space predict language-mediated eye movements in the visual world. Acta Psychologica, 121,
65–80.
Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of structure: A theory of analogical access
and mapping. Psychological Review, 104, 427–466.
Jones, L., & Estes, Z. (2005). Metaphor comprehension as attributive categorization. Journal of Memory and
Language, 53, 110–124.
Jones, L., & Estes, Z. (2006). Roosters, robins, and alarm clocks: Aptness and conventionality in metaphor
comprehension. Journal of Memory and Language, 55, 18–32.
Jones, M. N., Kintsch, W., & Mewhort, D. J. (2006). High-dimensional semantic space accounts of priming.
Journal of Memory and Language, 55, 534–552.
Kawamoto, A. (1993). Nonlinear dynamics in the resolution of lexical ambiguity: A parallel distributed process-
ing account. Journal of Memory and Language, 32, 474–516.
Keysar, B., Shen, Y., Glucksberg, S., & Horton, W. (2000). Conventional language: How metaphorical is it?
Journal of Memory and Language, 43, 576–593.
Kintsch, W. (1998). Comprehension: A paradigm for cognition. New York: Cambridge University Press.
Kintsch, W. (2000). Metaphor comprehension: A computational theory. Psychonomic Bulletin & Review, 7,
257–266.
Kintsch, W. (2001). Predication. Cognitive Science, 25, 173–202.
Kintsch, W. (2007). Meaning in context. In T. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.), Hand-
book of latent semantic analysis (pp. 89–105). Mahwah, NJ: Lawrence Erlbaum Associates.
Kintsch, W. (2008a). How the mind computes the meaning of metaphor: A simulation based on LSA. In
R. Gibbs (Ed.), The Cambridge handbook of metaphor and thought (pp. 129–142). New York: Cambridge
University Press.
Kintsch, W. (2008b). Symbol systems and perceptual representations. In M. de Vega, A. Glenberg, & A.
Graesser (Eds.), Symbols and embodiment: Debates on meaning and cognition (pp. 145–163). New York:
Oxford University Press.
Kövecses, Z. (2002). Metaphor: A practical introduction. New York: Oxford University Press.
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: The University of Chicago Press.
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to western
thought. New York: Basic Books.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of
the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the Intelli-
gent Essay Assessor. In M. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary
perspective (pp. 87–112). Mahwah, NJ: Lawrence Erlbaum Associates.
Landauer, T., McNamara, D., Dennis, S., & Kintsch, W. (2007). Handbook of latent semantic analysis. Mahwah,
NJ: Lawrence Erlbaum Associates.
Larkey, L. B., & Love, B. C. (2003). CAB: Connectionist analogy builder. Cognitive Science, 27, 781–
794.
Legendre, P., & Legendre, L. (1998). Numerical ecology, second english edition. Amsterdam: Elsevier Science
B.V.
294 A. Utsumi/Cognitive Science 35 (2011)
Lemaire, B., & Bianco, M. (2003). Contextual effects on metaphor comprehension: Experiment and simulation.
In F. Detje, D. Dörner, & H. Schaub (Eds.), Proceedings of the 5th international conference on cognitive
modeling (ICCM2003) (pp. 153–158). Germany: Universitäts-Verlag Bamberg.
Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th international
conference on computational linguistics and the 36th annual meeting on association for computational lin-
guistics (pp. 768–774). Montreal, Canada: ACL.
Louwerse, M. (2007). Symbolic or embodied representations: A case for symbol interdependency. In
T. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp.
107–120). Mahwah, NJ: Lawrence Erlbaum Associates.
Louwerse, M. (2008). Embodied relations are encoded in language. Psychonomic Bulletin & Review, 15, 838–
844.
Louwerse, M., & Van Peer, W. (2009). How cognitive is cognitive poetics? The interaction between symbolic
and embodied cognition. In G. Brone & J. Vandaele (Eds.), Cognitive poetics: Goals, gains and gaps (pp.
423–444). Berlin: Mouton de Gruyter.
Lowe, W., & McDonald, S. (2000). The direct route: Mediated priming in semantic space. In L. Gleitman &
A. Joshi (Eds.), Proceedings of the 22nd annual meeting of the cognitive science society (pp. 806–811).
Austin, TX: Cognitive Science Society.
Mangalath, P., Quesada, J., & Kintsch, W. (2004). Analogy-making as predication using relational information
and LSA vectors. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Meeting of
the Cognitive Science Society (CogSci2004) (p. 1623). Austin, TX: Cognitive Science Society.
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA:
MIT Press.
Marschark, M., & Hunt, R. (1985). On memory for metaphor. Memory and Cognition, 13, 413–424.
Marschark, M., Katz, A., & Paivio, A. (1983). Dimensions of metaphor. Journal of Psycholinguistic Research,
12(1), 17–40.
Martin, J. (1992). Computer understanding of conventional metaphoric language. Cognitive Science, 16,
233–270.
Martin, J. (1994). Metabank: A knowledge-base of metaphoric language conventions. Computational Intelli-
gence, 10, 134–149.
Mason, Z. (2004). CorMet: A computational, corpus-based conventional metaphor extraction system. Computa-
tional Linguistics, 30, 23–44.
McClelland, J. (2009). The place of modeling in cognitive science. Topics in Cognitive Science, 1, 11–38.
McRae, K., de Sa, V. R., & Seidenberg, M. S. (1997). On the nature and scope of featural representations of
word meaning. Journal of Experimental Psychology: General, 126(2), 99–130.
Murphy, G. (1996). On metaphoric representation. Cognition, 60, 173–204.
Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational
Linguistics, 33, 161–199.
Paivio, A., & Walsh, M. (1993). Psychological processes in metaphor comprehension and memory. In A. Ortony
(Ed.), Metaphor and thought, second edition (pp. 307–328). Cambridge, England: Cambridge University
Press.
Pecher, D., & Zwaan, R. (2005). Grounding cognition: The role of perception and action in memory, language
and thinking. Cambridge, England: Cambridge University Press.
Pexman, P., Lupker, S., & Hino, Y. (2002). The impact of feedback semantics in visual word recognition: Num-
ber-of-features effects in lexical decision and naming tasks. Psychonomic Bulletin & Review, 9, 542–549.
Pexman, P., Holyk, G., & Monfils, M.-H. (2003). Number-of-features effects and semantic processing. Memory
& Cognition, 31, 842–855.
Ramscar, M., & Yarlett, D. (2003). Semantic grounding in models of analogy: An environmental approach.
Cognitive Science, 27, 41–71.
Rodd, J., Gaskell, G., & Marslen-Wilson, W. (2002). Making sense of semantic ambiguity: Semantic
competition in lexical access. Journal of Memory and Language, 46, 245–266.
A. Utsumi/Cognitive Science 35 (2011) 295
Rowe, M., & McNamara, D. (2008). Inhibition needs no negativity: Negative links in the construction-
integration model. In B. Love, K. McRae, & V. Sloutsky (Eds.), Proceedings of the 30th Annual Meeting of
the Cognitive Science Society (CogSci2008) (pp. 1777–1782). Austin, TX: Cognitive Science Society.
Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24, 97–123.
Shahnaz, F., Berry, M., Pauca, V., & Plemmons, R. (2006). Document clustering using nonnegative matrix
factorization. Information Processing and Management, 42, 373–386.
Smith, E., Osherson, D., Rips, L., & Keane, M. (1988). Combining prototypes: A selective modification model.
Cognitive Science, 12, 485–527.
Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition second edition. Oxford, England:
Blackwell.
Sperber, D., & Wilson, D. (2008). A deflationary account of metaphors. In R. Gibbs (Ed.), The Cambridge
handbook of metaphor and thought (pp. 84–105). New York: Cambridge University Press.
Thomas, M., & Mareschal, D. (2001). Metaphor as categorization: A connectionist implementation. Metaphor
and Symbol, 16, 5–27.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.
Utsumi, A. (2005). The role of feature emergence in metaphor appreciation. Metaphor and Symbol, 20,
151–172.
Utsumi, A. (2007). Interpretive diversity explains metaphor-simile distinction. Metaphor and Symbol, 22,
291–312.
de Vega, M., Glenberg, A., & Graesser, A. (2008). Symbols and embodiment: Debates on meaning and cogni-
tion. New York: Oxford University Press.
Wagenmakers, E., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic Bulletin &
Review, 11, 192–196.
Weber, S. (1991). A connectionist model of literal and figurative adjective noun combinations. In D. Fass,
E. Hinkelman, & J. Martin (Eds.), Proceedings of the IJCAI workshop on computational approaches to non-
literal language: Metaphor, metonymy, idioms, speech acts, implicature (pp. 151–160). Sydney, Australia:
IJCAI.
Widdows, D. (2004). Geometry and meaning. Stanford, CA: CSLI Publications.
Wolff, P., & Gentner, D. (2000). Evidence for role-neutral initial processing of metaphors. Journal of Experi-
mental Psychology: Learning, Memory, and Cognition, 26, 529–541.
Zwaan, R., & Yaxley, R. (2003). Spatial iconicity affects semantic relatedness judgments. Psychonomic Bulletin
& Review, 10, 954–958.
participants. Participants performed three subtasks, namely, a feature listing task, a free
description task, and a comprehensibility rating task; however, this study used only the data
obtained in the feature listing task. In the feature listing task, participants were asked to con-
sider the meaning of each metaphor and list three or more features (i.e., meanings) of the
topic that they thought were involved in the interpretation of metaphors by words or
phrases.
After the metaphor comprehension experiment, the following preprocessing was
conducted for each metaphor M to obtain the final list of metaphorical meaning W(M). First,
a list of features generated in the metaphor comprehension experiment was generated for
each metaphor textitM. Then, closely related words or phrases in the generated list of
features were accepted as the same feature, if they met any of the following four criteria: (a)
they belonged to the same deepest category of a Japanese thesaurus Bunrui Goi Hyo (e.g.,
kakasenai and hitsuyoufukaketsu in Japanese, both of which mean being indispensable); (b)
they shared the same root form (e.g., red [akai in Japanese] and redness [akasa in
Japanese]); (c) they differed only in degree because of an intensive modifier (e.g., frightened
and quite frightened); or (d) a dictionary description of one word included the other word or
phrase (e.g., lie and not true). After this feature combination process, any feature mentioned
by only one participant was eliminated from the list of features. The amended list of features
according to this preprocessing was used as a list of meanings W(M) in this study.
The rating experiment comprised three rating tasks (vehicle conventionality, metaphor
aptness, and similarity) for metaphors and similes (Utsumi, 2007). The simulation experi-
ment in this study required only the conventionality and aptness ratings of metaphor. For
vehicle conventionality and metaphor aptness, 144 Japanese undergraduate students of the
University of Electro-Communications were recruited and assigned 10 metaphors. One half
of these students performed the conventionality rating task, whereas the other half per-
formed the aptness rating task. In the conventionality rating task, participants were given a
list of the vehicle terms of the metaphors and the most salient meaning and asked to rate
how conventional each meaning was as an alternative sense of the vehicle term on a scale of
1 (very novel) to 7 (very conventional). For example, as the meaning ephemeral was listed
by the largest number of participants for ‘‘Death is the fog,’’ the participants of this task
were asked the following question: ‘‘When we say that something (X) is the fog, how
conventional is the interpretation that this is something (X) that is ephemeral?’’ This method
of assessing vehicle conventionality was identical to the method used by Bowdle and
Gentner (2005). In the aptness rating task, participants were asked to rate how apt each
metaphor was, on a 7-point scale ranging from 1 (not at all apt) to 7 (extremely apt).
Following previous research (Jones & Estes, 2006), this study defined aptness as the extent
to which the metaphor captured the important features of the topic. These ratings were then
averaged across participants for each metaphor.