Documente Academic
Documente Profesional
Documente Cultură
recall. Table 2 presents the results of event learning on our perfect in this way. However, in the long term our goal is
two crowdsourced corpora, using the MUC6 cluster minimize the use of the crowd so as to speed up script
scoring metric (Vilain et al. 1995) to match actual cluster acquisition and reduce costs.
results against the gold standard. These values were
obtained using parameter optimization to select the optimal
weights for the clustering similarity function. The ideal Plot Graph Learning
weights for a given situation, naturally, depend on Once we have the events, the next stage is to learn the
language usage and the degree to which variability in event script structure. Following Chambers and Jurafsky (2009)
ordering can occur. Table 2 shows how each portion of our we learn before relations B(e1, e2) between all pairs of
algorithm helps to increase accuracy. Initial cluster seeding events e1 and e2. Chambers and Jurafsky train their model
makes use of shallow constraint information. The semantic on the Timebank corpus (Pustejovsky et al. 2003), which
similarity columns show how phrase expansion improves uses temporal signal words. Because we are able to
our clusters. Event location further increases cluster leverage a highly specialized corpus of narrative examples
accuracy by incorporating information contained in the of the desired situation, we can probabilistically determine
implicit ordering of events from the example narratives. ordering relations between events directly from the crowd-
For each set of results, we show the average precision, sourced narrative examples. The result of this process is a
recall, and F1 score for the best weightings for verb, noun, script-like structure called a plot graph (Weyhrauch 1997),
and event location similarity components. a partial ordering of events that defines a space of possible
Noting the differences between data sets, the movie date event sequences that can unfold during a given situation.
corpus has a significantly greater number of unique verbs
and nouns, longer narratives, and greater usage of Initial Script Construction
colloquial language. Interestingly, the movie date corpus
Script construction is the process of identifying the plot
contains a number of non-prototypical events about social
graph that most compactly and accurately explains the set
interactions (e.g., John attempts to kiss Sally) that appear
of crowdsourced narrative examples. Each possible before
rarely. The greater number of clusters containing few steps
relation between a pair of events is a hypothesis (i.e.
has a negative effect on recall values; a larger number of
B(e1, e2) = true or B(e2, e1) = true) that must be verified.
narratives would ameliorate this effect by providing more
For every pair of events e1 and e2 we count the observation
examples of rare steps. By crowdsourcing a highly
of evidence for and against each hypothesis. Let s1 be a
specialized corpus, we are able to maintain precision in the
step in the cluster representing event e1, and let s2 be a step
face of a more complicated situation without restricting
in the cluster event representing event e2. If s1 and s2
worker ability to express their conception of the salient
appear in the same input narrative, and s1 appears before s2,
points of the situation.
then we consider this as an observation in support of
While we believe that our event learning process
B(e1, e2) = true. If s2 appears before s1 in the same
achieves acceptably high accuracy rates, errors in event
narrative, this observation supports B(e2, e1) = true.
clustering may impact overall script learning performance
The probability of a particular hypothesis h is ph = k/n
(the effects of clustering errors on script learning will be
where n is the number of observations and k is the
discussed in a later section). To improve event-clustering
observations that support h. We also measure the
accuracy, we can adopt a technique to improve cluster
confidence of each hypothesis (cf. Wang 2009); a small
quality using a second round of crowdsourcing, similar to
number of observations for a hypothesis will result in low
that proposed by Boujarwah, Abowd, and Arriaga (2012).
confidence. We cannot assume prior distributions of
Workers can be tasked with inspecting the members of a
orderings between arbitrary events, so we use the
cluster and marking those that do not belong. If there is
imprecise Dirichlet model (Walley 1996) to represent the
sufficient agreement about a particular step, it is removed
uncertainty of opposing relations as the interval between
from the cluster. A second round of crowdsourcing is used
the most pessimistic and the most optimistic estimates. The
to task workers to identify which cluster these “un-
upper and lower estimates of the probability are
clustered” steps should be placed into. Crowdsourcing is
and , respectively, where the
often used to improve on artificial intelligence results (von
parameter s can be considered as a number of observations
Ahn 2005) and we can increase clustering accuracy to near
whose outcomes are hidden. Our confidence in a
probability is . Q all of events (e1, e2) where e2 is reachable from e1 or unordered
We select relations for the plot graph in which the prob- Foreach (e1, e2) Q in order of decreasing DN(e1, e2) – DG(e1, e2) do:
E all events such that for each ei E, DG(e1, ei) = DN(e1, e2) – 1
ability and confidence exceed thresholds Tp, Tc [0,1],
Foreach ei E do:
respectively. Tp and Tc apply to the entire graph and If edge ei e2 has probability and confidence less than Tp, Tc
provide an initial estimate of the best plot graph. However, and will not create a cycle if added to the graph do:
a graph which better explains the crowdsourced narratives Strengthen the edge by adding one observation in support of it
If ei e2 now has probability and confidence greater than Tp, Tc
may be found if the thresholds could be locally relaxed for
and adding ei e2 to the graph decreases MSGE do:
particular relations. Below, we introduce a measure of plot Add ei e2 to the graph
graph error and an algorithm for iteratively improving the Return graph
plot graph to minimize the error. Figure 2. The plot graph improvement algorithm.
Conclusions
Human computation and crowdsourcing provides direct
access to humans and the ways they express experiential
knowledge. Guided by our five requirements for acquiring
sociocultural knowledge, we demonstrated that we could
Figure 4. A plot graph generated for the restaurant situation.
obtain reasonable scripts of common sociocultural
observations of ordering relations between otherwise situations. Cost-effectiveness is due to automated aggre-
unrelated events, possibly causing cycles in the plot graph. gation of worker effort. Natural interactions are achieved
If the number of improperly clustered sentences is by allowing humans to express their knowledge intuitively
relatively small these relations have low probability and as narrative. Our technique is tolerant of situational
confidence and will be filtered out. Second, two distinct variations and can accommodate omitted events that are
events may be merged into one event, causing ordering natural consequences of human elicitation of commonsense
cycles in which all edges have high probability and knowledge. Proactivity is left for future work, although we
confidence. When this happens, it is theoretically possible have identified how human computation can be
to eliminate the cycle by splitting the event cluster in the advantageous. The potential contributions of our prelim-
cycle with the highest inter-cluster variance. We have not inary work are (a) the use of stories as a means for
yet implemented this procedure, however. Third, an event knowledge transfer from human to computer via a
may be split into two clusters unordered relative to each specialized corpus, (b) the use of explicit instructions to
other. This creates the appearance that an event must occur crowd workers to control for natural language use, and (c)
twice in a situation accepted by the script. a procedure for compiling script-like knowledge structures
Closely inspecting Figure 4, we note that before from story examples. We believe that this is a first step
relations do not always imply strict causal necessity. For toward rapidly and automatically acquiring functional
example, before placing an order at a fast-food restaurant sociocultural knowledge through the use of anonymous
one can wait in line or drive to drive-thru but not both. human storytellers. Our approach has the potential to
Either event is sufficient for placing an order. They both significantly alleviate the knowledge-authoring bottleneck
appeared in the graph because there are two main that has limited many practical intelligent systems.
variations to the situation, walk-in and drive-through (this