Automatic Summarization

Automatic summarisation in the Information Age
Constantin Orasan
Research Group in Computational Linguistics
Research Institute in Information and Language Processing
University of Wolverhampton
http://www.wlv.ac.uk/
~
in6093/
http://www.summarizationonline.info
12
th
Sept 2009
Structure of the course
1 Introduction to automatic summarisation
2 Important methods in automatic summarisation
3 Automatic summarisation and the Internet
What is a summary?
What is automatic summarisation
Context factors
Evaluation
General information about evaluation
Direct evaluation
Target-based evaluation
Task-based evaluation
Automatic evaluation
Evaluation conferences
What is a summary?
Abstract of scientic paper
Source: (Sparck Jones, 2007)
Summary of a news event
Source: Google news http://news.google.com
Summary of a web page
Source: Bing http://www.bing.com
Summary of nancial news
Source: Yahoo! Finance http://finance.yahoo.com/
Maps
Source: Google Maps http://maps.google.co.uk/
Maps
Source: Google Maps http://maps.google.co.uk/
Summaries in everyday life
Headlines: summaries of newspaper articles
Table of contents: summary of a book, magazine
Digest: summary of stories on the same topic
Highlights: summary of an event (meeting, sport event, etc.)
Abstract: summary of a scientic paper
Bulletin: weather forecast, stock market, news
Biography: resume, obituary
Abridgment: of books
Review: of books, music, plays
Scale-downs: maps, thumbnails
Trailer: from lm, speech
Summaries in the context of this tutorial
are produced from the text of one or several documents
the summary is a text or a list of sentences
Denitions of summary
an abbreviated, accurate representation of the content of a
document preferably prepared by its author(s) for publication
with it. Such abstracts are also useful in access publications
and machine-readable databases (American National
Standards Institute Inc., 1979)
an abstract summarises the essential contents of a particular
knowledge record, and it is a (Cleveland, 1983)
the primary function of abstracts is to and the structure
and content of the text (van Dijk, 1980)
knowledge record, and it is a (Cleveland, 1983)
knowledge record, and it is a true surrogate of the document
(Cleveland, 1983)
(Cleveland, 1983)
(Cleveland, 1983)
the primary function of abstracts is to indicate and predict
the structure and content of the text (van Dijk, 1980)
(Cleveland, 1983)
the primary function of abstracts is to indicate and predict
the structure and content of the text (van Dijk, 1980)
Denitions of summary (II)
the abstract is a time saving device that can be used to nd
a particular part of the article without reading it; [...] knowing
the structure in advance will help the reader to get into the
article; [...] as a summary of the article, it can serve as a
review, or as a clue to the content. Also, an abstract gives
an exact and concise knowledge of the total content of the
very much more lengthy original, a factual summary which is
both an elaboration of the title and a condensation of the
report [...] if comprehensive enough, it might replace reading
the article for some purposes (Graetz, 1985).
these denitions refer to human produced summaries
Denitions for automatic summaries
these denitions are less ambitious
a concise representation of a documents content to the
reader to determine its to a specic information (Johnson,
1995)
a summary is a text produced from one or more texts, that
contains a in the original text(s), and is not longer than half
of the original text(s). (Hovy, 2003)
a concise representation of a documents content to enable
the reader to determine its relevance to a specic
information (Johnson, 1995)
contains a signicant portion of the information in the original
text(s), and is not longer than half of the original text(s).
(Hovy, 2003)
contains a signicant portion of the information in the original
text(s), and is not longer than half of the original text(s).
(Hovy, 2003)
What is automatic summarisation?
What is automatic (text) summarisation
Text summarisation
a reductive transformation of source text to summary text
through content reduction by selection and/or generalisation
on what is important in the source. (Sparck Jones, 1999)
the process of the most important information from a source
(or sources) to produce an abridged version for a . (Mani and
Maybury, 1999)
Automatic text summarisation = The process of producing
summaries automatically.
Text summarisation
the process of the most important information from a source
(or sources) to produce an abridged version for a . (Mani and
Maybury, 1999)
Text summarisation
the process of distilling the most important information from a
source (or sources) to produce an abridged version for a
particular user (or users) and task (or tasks). (Mani and
Maybury, 1999)
Text summarisation
Maybury, 1999)
Text summarisation
Maybury, 1999)
Related disciplines
There are many disciplines which are related to automatic
summarisation:
automatic categorisation/classication
term/keyword extraction
information retrieval
information extraction
question answering
text generation
data/opinion mining
Automatic categorisation/classication
Automatic text categorisation
is the task of building software tools capable of classifying text
documents under predened categories or subject codes
each document can be in one or several categories
examples of categories: Library of Congress subject headings
Automatic text classication
is usually considered broader than text categorisation
includes text clustering and text categorisation
in does not necessary require to know the classes
Examples: email/spam ltering, routing,
Term/keyword extraction
automatically identies terms/keywords in texts
a term is a word or group of words which are important in a
domain and represent a concept of the domain
a keyword is an important word in a document, but it is not
necessary a term
terms and keywords are extracted using a mixture of
statistical and linguistic approaches
automatic indexing identies all the relevant occurrences of a
keyword in texts and produces indexes
Information retrieval (IR)
Information retrieval attempts to nd information relevant to
a user query and rank it according to its relevance
the output is usually a list of documents in some cases
together with relevant snippets from the document
Example: search engines
needs to be able to deal with enormous quantities of
information and process information in any format (e.g. text,
image, video, etc.)
is a eld which achieved a level of maturity and is used in
industry and business
combines statistics, text analysis, link analysis and user
interfaces
Information extraction (IE)
Information extraction is the automatic identication of
predened types of entities, relations or events in free text
quite often the best results are obtained by rule-based
approaches, but machine learning approaches are used more
and more
can generate database records
is domain dependent
this eld developed a lot as a result of the MUC conferences
one of the tasks in the MUC conferences was to ll in
templates
Example: Ford appointed Harriet Smith as president
Person: Harriet Smith
Job: president
Company: Ford
and more
is domain dependent
templates
Job: president
Company: Ford
and more
is domain dependent
templates
Job: president
Company: Ford
Question answering (QA)
Question answering aims at identifying the answer to a
question in a large collection of documents
the information provided by QA is more focused than
information retrieval
a QA system should be able to answer any question and
should not be restricted to a domain (like IE)
the output can be the exact answer or a text snippet which
contains the answer
the domain took o as a result of the introduction of QA
track in TREC
user-focused summarisation = open-domain question
answering
Text generation
Text generation creates text from computer-internal
representations of information
most generation systems rely on massive amounts of linguistic
knowledge and manually encoded rules for translating the
underlying representation into language
text generation systems are very domain dependent
Data mining
Data mining is the (semi)automatic discovery of trends,
patterns or unusual data across very large data sets, usually
for the purposes of decision making
Text mining applies methods from data mining to textual
collections
Processes really large amounts of data in order to nd useful
information
In many cases it is not known (clearly) what is sought
Visualisation has a very important role in data mining
Opinion mining
Opinion mining (OM) is a recent discipline at the crossroads
of information retrieval and computational linguistics which is
concerned not with the topic a document is about, but with
the opinion it expresses.
Is usually applied to collections of documents (e.g. blogs) and
seen part of text/data mining
Sentiment Analysis, Sentiment Classication, Opinion
Extraction are other names used in literature to identify this
discipline.
Examples of OM problems:
What is the general opinion on the proposed tax reform?
How is popular opinion on the presidential candidates evolving?
Which of our customers are unsatised? Why?
Characteristics of summaries
Context factors
the context factors dened by Sparck Jones (1999; 2001)
represent a good way of characterising summaries
they do not necessary refer to automatic summaries
they do not necessary refer to summaries
there are three types of factors:
input factors: characterise the input document(s)
purpose factors: dene the transformations necessary to obtain
the output
output factors: characterise the produced summaries
Context factors
the output
Context factors
the output
Context factors
the output
Context factors
the output
Context factors
the output
Context factors
the output
Context factors
Input factors Purpose factors Output factors
Form Situation Form
- Structure Use - Structure
- Scale Summary type - Scale
- Medium Coverage - Medium
- Genre Relation to source - Language
- Language - Format
- Format Subject matter
Subject type
Unit
Input factors - Form
structure: explicit organisation of documents.
Can be problem - solution structure of scientic documents,
pyramidal structure of newspaper articles, presence of
embedded structure in text (e.g. rhetorical patterns)
scale: the length of the documents
Dierent methods need to be used for a book and for a
newspaper article due to very dierent compression rates
medium: natural language/sublanguage/specialised language
If the text is written in a sublanguage it is less ambiguous and
therefore its easier to process.
language: monolingual/multilingual/cross-lingual
Monolingual: the source and the output are in the same
language
Multilingual: the input is in several languages and output in
one of these languages
Cross-lingual: the language of the output is dierent from the
language of the source(s)
formatting: whether the source is in any special formatting.
This is more a programming problem, but needs to be taken
into consideration if information is lost as a result of
conversion.
language
conversion.
language
conversion.
language
conversion.
language
conversion.
Input factors
Subject type: intended readership
Indicates whether the source was written from the general
reader or for specic readers. It inuences the amount of
background information present in the source.
Unit: single/multiple sources (single vs. multi-document
summarisation)
mainly concerned with the amount of redundancy in the text
Input factors
Subject type: intended readership
Indicates whether the source was written from the general
reader or for specic readers. It inuences the amount of
background information present in the source.
Unit: single/multiple sources (single vs. multi-document
summarisation)
mainly concerned with the amount of redundancy in the text
Why input factors are useful?
The input factors can be used whether to summarise a text or not:
Brandow, Mitze, and Rau (1995) use structure of the
document (presence of speech, tables, embedded lists, etc.)
to decide whether to summarise it or not.
Louis and Nenkova (2009) train a system on DUC data to
determine whether the result is expected to be reliable or not.
Purpose factors
Use: how the summary is used
retrieving: the user uses the summary to decide whether to
read the whole document,
substituting: use the summary instead of the full document,
previewing: get the structure of the source, etc.
Summary type: indicates how is the summary
indicative summaries provide a brief description of the source
without going into details,
informative summaries follow the ideas main ideas and
structure of the source
critical summaries give a description of the source and discuss
its contents (e.g. review articles can be considered critical
summaries)
Purpose factors
summaries)
Purpose factors
summaries)
Purpose factors
summaries)
Purpose factors
summaries)
Purpose factors
summaries)
Purpose factors
summaries)
Purpose factors
summaries)
Purpose factors
Relation to source: whether the summary is an extract or
abstract
extract: contains units directly extracted from the document
(i.e. paragraphs, sentences, clauses),
abstract: includes units which are not present in the source
Coverage: which type of information should be present in the
summary
generic: the summary should cover all the important
information of the document,
user-focused: the user indicates which should be the focus of
the summary
Purpose factors
abstract
summary
the summary
Purpose factors
abstract
summary
the summary
Purpose factors
abstract
summary
the summary
Purpose factors
abstract
summary
the summary
Purpose factors
abstract
summary
the summary
Output factors
Scale (also referred to as compression rate): indicates the
length of the summary
American National Standards Institute Inc. (1979)
recommends 250 words
Borko and Bernier (1975) point out that imposing an arbitrary
limit on summaries is not good for their quality, but that a
length of around 10% is usually enough
Hovy (2003) requires that the length of the summary is kept
less then half of the sources size
Goldstein et al. (1999) point out that the summary length
seems to be independent from the length of the source
the structure of the output can be inuenced by the structure
of the input or by existing conventions
the subject matter can be the same as the input, or can be
broader when background information is added
Evaluation of automatic
summarisation
Why is evaluation necessary?
Evaluation is very important because it allows us to assess the
results of a method or system
Evaluation allows us to compare the results of dierent
methods or systems
Some types of evaluation allow us to understand why a
method fails
almost each eld has its specic evaluation methods
there are several ways to perform evaluation
How the system is considered
How humans interact with the evaluation process
What is measured
black-box evaluation:
the system is considered opaque to the user
the system is considered as a whole
allows direct comparison between dierent systems
does not explain the systems performance
glass-box evaluation:
each of the systems components are assessed in order to
understand how the nal result is obtained
is very time consuming and dicult
relies on phenomena which are not fully understood (e.g. error
propagation)
black-box evaluation:
the system is considered opaque to the user
the system is considered as a whole
allows direct comparison between dierent systems
does not explain the systems performance
glass-box evaluation:
each of the systems components are assessed in order to
understand how the nal result is obtained
is very time consuming and dicult
relies on phenomena which are not fully understood (e.g. error
propagation)
How humans interact with the process
o-line evaluation
also called automatic evaluation because it does not require
human intervention
usually involves the comparison between the systems output
and a gold standard
very often annotated corpora are used as gold standards
are usually preferred because they are fast and not directly
inuenced by the human subjectivity
can be repeated
cannot be (easily) used in all the elds
online evaluation
requires humans to assess the output of the system according
to some guidelines
is useful for those tasks where the output of the system cannot
be uniquely predicted (e.g. summarisation, text generation,
question answering, machine translation)
are time consuming, expensive and cannot be easily repeated
How humans interact with the process
o-line evaluation
also called automatic evaluation because it does not require
human intervention
usually involves the comparison between the systems output
and a gold standard
very often annotated corpora are used as gold standards
are usually preferred because they are fast and not directly
inuenced by the human subjectivity
can be repeated
cannot be (easily) used in all the elds
online evaluation
requires humans to assess the output of the system according
to some guidelines
is useful for those tasks where the output of the system cannot
be uniquely predicted (e.g. summarisation, text generation,
question answering, machine translation)
are time consuming, expensive and cannot be easily repeated
What it is measured
intrinsic evaluation:
evaluates the results of a system directly
for example: quality, informativeness
sometimes does not give a very accurate view of how useful
the output can be for another task
extrinsic evaluation:
evaluates the results of another system which uses the results
of the rst
examples: post-edit measures, relevance assessment, reading
comprehension
What it is measured
intrinsic evaluation:
evaluates the results of a system directly
for example: quality, informativeness
sometimes does not give a very accurate view of how useful
the output can be for another task
extrinsic evaluation:
evaluates the results of another system which uses the results
of the rst
examples: post-edit measures, relevance assessment, reading
comprehension
Evaluation used in automatic
summarisation
evaluation is very dicult task because there is no clear idea
what constitutes a good summary
the number of perfectly acceptable summaries from a text is
not limited
four types of evaluation methods
Intrinsic Extrinsic
On-line Direct evaluation Task-based evaluation
O-line evaluation Target-based evaluation Automatic evaluation
Evaluation used in automatic
summarisation
evaluation is very dicult task because there is no clear idea
what constitutes a good summary
the number of perfectly acceptable summaries from a text is
not limited
four types of evaluation methods
Intrinsic Extrinsic
On-line Direct evaluation Task-based evaluation
O-line evaluation Target-based evaluation Automatic evaluation
Direct evaluation
intrinsic & online evaluation
requires humans to read summaries and measure their quality
and informativeness according to some guidelines
is one of the rst evaluation methods used in automatic
summarisation
to a certain extent it is quite straight forward which makes it
appealing for small scale evaluation
it is time consuming, subjective and in many cases cannot be
repeated by others
Direct evaluation: quality
it tries to assess the quality of a summary independently from
the source
can be simple classication of sentences in acceptable or
unacceptable
Minel, Nugier, and Piat (1997) proposed an evaluation
protocol which considers the coherence, cohesion and legibility
of summaries
cohesion of a summary is measured in terms of dangling
anaphors
the coherence in terms of discourse ruptures.
the legibility is decided by jurors who are requested to classify
each summary in very bad, bad, mediocre, good and very good.
it does not assess the contents of a summary so it could be
misleading
Direct evaluation: informativeness
assesses how correctly the information in the source is
reected in the summary
the judges are required to read both the source and the
summary, for this reason making the process longer and more
expensive
judges are generally required to:
identify important ideas from the source which do not appear
in the summary
ideas from the summary which are not important enough and
therefore should not be there
identify the logical development of the ideas and see whether
they appear in the summary
given that it is time consuming automatic methods to
compute the informativeness are preferred
Target-based evaluation
it is the most used evaluation method
compares the automatic summary with a gold standard
they are appropriate for extractive summarisation methods
it is intrinsic and o-line
it does not require to have humans involved in the evaluation
has the advantage of being fast, cheap and can be repeated
by other researchers
the drawback is that it requires a gold standard which usually
is not easy to produce
Corpora as gold standards
usually annotated corpora are used as gold standard
usually the annotation is very simple: for each sentence it
indicates whether it is important enough to be included in the
summary or not
such corpora are normally used to assess extracts
can be produced manually and automatically
these corpora normally represent one point of view
Manually produced corpora
Require human judges to read each text from the corpus and
to identify the important units in each text according to
guidelines
Kupiec, Pederson, and Chen (1995) and Teufel and Moens
(1997) took advantage of the existence of human produced
abstracts and asked human annotators to align sentences from
the document with sentences from the abstracts.
it is not necessary to use specialised tools apply this
annotation, but in many cases they can help
Guidelines for manually annotated corpora
Edmundson (1969) annotated a heterogenous corpus
consisting of 200 documents in the elds of physics, life
science, information science and humanities. The important
sentences were considered to be those which indicated:
what the subject area is,
why the research is necessary,
how the problem is solved,
which are the ndings of the research.
Hasler, Orasan, and Mitkov (2003) annotated a corpus of
newspaper articles and the important sentences were
considered those linked to the main topic of text as indicated
in the title (See http://clg.wlv.ac.uk/projects/CAST/ for the
complete guidelines)
Problems with manually produced corpora
given how subjective the identication of important sentences
is, the agreement between annotators is low
the inter-annotator agreement is determined by the genre of
texts and the length of summaries
Hasler, Orasan, and Mitkov (2003) tries to measure the
agreement between three annotators and notice very low
value, but
when the contents is compared the agreement increases
Automatically produced corpora
Relies on the fact that very often human produce summaries
by copy-paste from the source
there are algorithms which identify sets of sentences from the
source which cover the information in the summary
Marcu (1999) employed a greedy algorithm which eliminates
sentences from the whole document that do not reduce the
similarity between the summary and the remaining sentences.
Jing and McKeown (1999) treat the human produced abstract
as a sequence of words which appears in the document, and
reformulate the problem of alignment as the problem of
nding the most likely position of the words from the abstract
in the full document using a Hidden Markov Model.
Evaluation measures used with annotated
corpora
usually precision, recall and f-measure are used to calculate
the performance of a system
the list of sentences extracted by the program is compared
with the list of sentences marked by humans
Extracted by program Not-extracted by program
Extracted by humans True Positives False negatives
Not extracted by humans False positives True negatives
Precision =
TruePositives
TruePositives + FalsePositives
Recall =
TruePositives
TruePositives + FalseNegatives
F score =
(
2
+ 1)PR
2
P + R
Summary Evaluation Environment (SEE)
SEE environment was is being used in the DUC evaluations
is a combination between direct and target evaluation
it requires humans to assess whether each unit from the
automatic summary appears in the target summary
it also oers the option to answer questions about the quality
of the summary (e.g. Does the summary build from sentence
to sentence to a coherent body of information about the
topic?)
Relative utility of sentences (Radev et. al.,
2000)
Addresses the problem that humans often disagree when they
are asked to select the top n% sentences from a document
Each sentence in the document receives a score from 1 to 10
depending on how summary worthy is
The score of an automatic summary is the normalised score of
the extracted sentences
When several judges are available the score of a summary is
the average over all judges
Can be used for any compression rate
Target-based evaluation without
annotated corpora
They require that the sources have a human provided
summary (but they do not need to be annotated)
Donaway et. al. (2000) propose to use cosine similarity
between an automatic summary and human summary - but it
relies on words co-occurrences
ROUGE uses the number of overlapping units (Lin, 2004)
Nenkova and Passonneau (2004) proposed the pyramid
evaluation method which addresses the problem that dierent
people select dierent content when writing summaries
ROUGE
ROUGE = Recall-Oriented Understudy for Gisting Evaluation
(Lin, 2004)
inspired by BLEU (Bilingual Evaluation Understudy) used in
machine translation (Papineni et al., 2002)
Developed by Chin-Yew Lin and available at
http://berouge.com
Compares quality of a summary by comparison with ideal
summaries
Metrics count the number of overlapping units
There are several versions depending on how the comparison
is made
ROUGE-N
N-gram co-occurrence statistics is a recall oriented metric
S1: Police killed the gunman
S2: Police kill the gunman
S3: The gunman kill police
S2=S3
ROUGE-L
Longest common sequence
S1: police killed the gunman
S2: police kill the gunman
S3: the gunman kill police
S2 = 3/4 (police the gunman)
S3 = 2/4 (the gunman)
S2 > S3
ROUGE-W
Weighted Longest Common Subsequence
S1: [A B C D E F G]
S2: [A B C D H I J]
S3: [A H B J C I D]
ROUGE-W favours consecutive matches
S2 better than S3
ROUGE-S
ROUGE-S: Skip-bigram recall metric
Arbitrary in-sequence bigrams are computed
S1: police killed the gunman (police killed, police the,
police gunman, killed the, killed gunman, the
gunman)
S2: police kill the gunman (police the, police gunman,
the gunman)
S3: the gunman kill police (the gunman)
S4: the gunman police killed (police killed, the gunman)
S2 better than S4 better than S3
ROUGE-SU adds unigrams to ROUGE-S
ROUGE
Experiments on DUC 2000 - 2003 data shows good corelation
with human judgement
Using multiple references achieved better correlation with
human judgement than just using a single reference.
Stemming and removing stopwords improved correlation with
human judgement
is an extrinsic and on-line evaluation
instead of evaluating the summaries directly, humans are
asked to perform tasks using summaries and the accuracy of
these tasks is measured
the assumption is that the accuracy does not decrease when
good summaries are used
the time should reduce
Example of tasks: classication of summaries according to
predened classes (Saggion and Lapalme, 2000), determining
the relevance of a summary to a topic (Miike et al., 1994; Oka
and Ueda, 2000), and reading comprehension (Morris, Kasper,
and Adams, 1992; Orasan, Pekar, and Hasler, 2004).
this evaluation can be very useful because it assess a summary
in real situations
it is time consuming and requires humans to be involved in
the evaluation process
in order to obtain statistically signicant results a large
number of judges have to be involved
this evaluation method has been used in evaluation
conferences
Automatic evaluation
extrinsic and o-line evaluation method
tries to replace humans in task-based evaluations with
automatic methods which perform the same task and are
evaluated automatically
Examples:
text retrieval (Brandow, Mitze, and Rau, 1995): increase in
precision but drastic reduction of recall
text categorisation (Kolcz, Prabakarmurthi, and Kalita, 2001):
the performance of categorisation increases
has the advantage of being fast and cheap, but in many cases
the tasks which can benet from summaries are as dicult to
evaluate as automatic summarisation (e.g. Kuo et al. (2002)
proposed to use QA)
intrinsic
extrinsic
From (Sparck Jones, 2007)
intrinsic
extrinsic
semi-purpose: inspection (e.g. for proper
English)
intrinsic
extrinsic
English)
quasi-purpose: comparison with models (e.g.
ngrams, nuggets)
intrinsic
extrinsic
English)
ngrams, nuggets)
pseudo-purpose: simulation of task contexts
(e.g. action scenarios)
intrinsic
extrinsic
English)
ngrams, nuggets)
pseudo-purpose: simulation of task contexts
(e.g. action scenarios)
full-purpose: operation in task context (e.g.
report writing)
Evaluation conferences
evaluation conferences are conferences where all the
participants have to complete the same task on a common set
of data
these conferences allow direct comparison between the
participants
such conferences determined quick advances in elds: MUC
(information extraction), TREC (Information retrieval &
question answering), CLEF (question answering for
non-English languages and cross-lingual QA)
SUMMAC
the rst evaluation conference organised in automatic
summarisation (in 1998)
6 participants in the dry-run and 16 in the formal evaluation
mainly extrinsic evaluation:
adhoc task determine the relevance of the source document to
a query (topic)
categorisation assign to each document a category on the basis
of its summary
question answering answer questions using the summary
a small acceptability test where direct evaluation was used
SUMMAC
the TREC dataset was used
for the adhoc evaluation 20 topics each with 50 documents
were selected
the time for the adhoc task halves with a slight reduction in
the accuracy (which is not signicant)
for the categorisation task 10 topics each with 100 documents
(5 categories)
there is no dierence in the classication accuracy and the
time reduces only for 10% summaries
more details can be found in (Mani et al., 1998)
Text Summarization Challenge
is an evaluation conference organised in Japan and its main
goals are to evaluate Japanese summarisers
it was organised using the SUMMAC model
precision and recall were used to evaluate single document
summaries
humans had to assess the relevance of summaries from text
retrieved for specic queries to these queries
is also included some readability measures (e.g. how many
deletions, insertions and replacements were necessary)
more details can be found in (Fukusima and Okumura, 2001;
Okumura, Fukusima, and Nanba, 2003)
Document Understanding Conference
(DUC)
it is an evaluation conference organised part of a larger
program called TIDES (Translingual Information Detection,
Extraction and Summarisation)
organised from 2000
at be beginning it was not that dierent from SUMMAC, but
in time more dicult tasks were introduced:
2001: single and multi-document generic summaries with 50,
100, 200, 400 words
2002: single and multi-document generic abstracts with 50,
100, 200, 400 words, and multi-document extracts with 200
and 400 words
2003: abstracts of documents and document sets with 10 and
100 words, and focused multi-document summaries
Document Understanding Conference
in 2004 participants were required to produce short (<665
bytes) and (very short <75 bytes) summaries of single
documents and document sets, short document prole,
headlines
from 2004 ROUGE is used as evaluation method
in 2005: short multiple document summaries, user-oriented
questions
in 2006: same as in 2005 but also used pyramid evaluation
more information available at: http://duc.nist.gov/
in 2007: 250 word summary, 100 update task, pyramid
evaluation was used as a community eort
in 2008 DUC became TAC (Text Analysis Conference)
How humans produce summaries
Single-document summarisation methods
Surface-based summarisation methods
Machine learning methods
Methods which exploit the discourse structure
Knowledge-rich methods
Multi-document summarisation methods
Ideal summary processing model
Source text(s)
Interpretation
Source representation
Transformation
Summary representation
Generation
Summary text
How humans produce summaries
How humans summarise documents
Determining how humans summarise documents is a dicult
task because it requires interdisciplinary research
Endres-Niggemeyer (1998) breaks the process in three stages:
document exploration, relevance assessment and summary
production
these have been determined through interviews with
professional summarisers
use a top-down approach
the expert summarisers do not attempt to understand the
source in great detail, instead they are trained to identify
snippets which contain important information
very few automatic summarisation methods use an approach
similar to humans
Document exploration
its the rst step
the sources title, outline, layout and table of contents are
examined
the genre of the texts is investigated because very often each
genre dictates a certain structure
For example expository texts are expected to have a
problem-solution structure
the abstractors knowledge about the source is represented as
a schema.
schema = an abstractors prior knowledge of document types
and their information structure
Relevance assessment
at this stage summarisers identify the theme and the thematic
structure
theme = a structured mental representation of what the
document is about
this structure allows identication of relations between text
chunks
is used to identify important information, deletion of irrelevant
and unnecessary information
the schema is populated with elements from the thematic
structure, producing an extended structure of the theme
Summary production
the summary is produced from the expanded structure of the
theme
in order to avoid producing a distorted summary, summarisers
relay mainly on copy/paste operations
the chunks which are copied are reorganised to t the new
structure
standard sentence patters are also used
summary production is a long process which requires several
iterations
checklists can be used
Single-document summarisation
methods
Single document summarisation
Produces summaries from a single document
There are two main approaches:
automatic text extraction produces extracts also referred to
as extract and rearrange
automatic text abstraction produces abstracts also referred
to as understand and generate
Automatic text extraction is the most used method to
produce summaries
produce summaries
produce summaries
produce summaries
produce summaries
Automatic text extraction
Extracts important sentences from the text using dierent
methods and produces an extract by displaying the important
sentences (usually in order of appearance)
A large proportion of the sentences used in human produces
summaries are sentences have been extracted directly from the
text or which contain only minor modications
Uses dierent statistical, surface-based and machine learning
techniques to determine which sentences are important
First attempts made in the 50s
Automatic text extraction
These methods are quite robust
The main drawback of this method is that it overlooks the
way in which relationships between concepts in the text are
realised by the use of anaphoric links and other discourse
devices
Extracting paragraphs can solve some of these problems
Some methods involve excluding the unimportant sentences
instead of extracting the important sentences
Surface-based summarisation
methods
Term-based summarisation
It was the rst method used to produce summaries by Luhn
(1958)
Relies on the assumption that important sentences have a
large number of important words
The importance of a word is calculated using statistical
measures
Even though this method is very simple it is still used in
combination with other methods
A demo summariser which relies on term frequency can be
found at:
http://clg.wlv.ac.uk/projects/CAST/demos.php
How to compute the importance of a word
Dierent methods can be used:
Term frequency: how frequent is a word in the document
TF*IDF: relies on how frequent is a word in a document and in
how many documents from a collection the word appears
TF IDF(w) = TF(w) log(
Number of documents
Number of documents with w
)
other statistical measures, for examples see (Orasan, 2009)
Issues:
stoplists should be used
what should be counted: words, lemmas, truncation, stems
how to select the document collection
Term-based summarisation: the algorithm
(and can be used for other types of summarisers)
1 Score all the words in the source according to the selected
measure
2 Score all the sentences in the text by adding the scores of the
words from these sentences
3 Extract the sentences with top N scores
4 Present the extracted sentences in the original order
measure
measure
measure
Position method
It was noticed that in some genres important sentence appear
in predened positions
First used by Edmundson (1969)
Depends very much from one genre to another:
newswire: lead summary the rst few sentences from the text
scientic papers: the rst/last sentences in the paragraph are
relevant for the topic of the paragraph (Baxendale, 1958)
scientic papers: important information occurs in specic
sections of the document (introduction/conclusion)
Lin and Hovy (1997) use a corpus to determine the where
these important sentences occur
Title method
words in titles and headings are positively relevant to
summarisation
Edmundson (1969) noticed that can lead to an increase in
performance of up to 8% if the score of sentences which
include such words are increased
Cue words/indicating phrases
Makes use of words or phrases classied as positive or
negative which may indicate the topicality and thus the
sentence value in an abstract
positive: signicant, purpose, in this paper, we show,
negative: Figure 1, believe, hardly, impossible, pronouns
Paice (1981) proposes indicating phrases which are basically
patterns (e.g. [In] this paper/report/article we/I show)
Methods inspired from IR (Salton et. al.,
1997)
decomposes a document in a set of paragraphs
computes the similarity between paragraphs and it represents
the strength of the link between two paragraphs
similar paragraphs are considered those who have a similarity
above a threshold
paragraphs can be extracted according to dierent strategies
(e.g. the number of links they have, select connected
paragraphs)
How to combine dierent methods
Edmundson (1969) used a linear combination of features:
Weight(S) = Title(S)+Cue(S)+Keyword(S)+Position(S)
the weights were adjusted manually
the best system was cue + title + position
it is better to use machine learning methods to combine the
results of dierent modules
Machine learning methods
What is machine learning (ML)?
Mitchell (1997):
machine learning is concerned with the question of how to
construct computer programs that automatically improve with
experience
A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves
with experience E
What is machine learning? (2)
Reasoning is based on the similarity between new situations
and the ones present in the training corpus
In some cases it is possible to understand what it is learnt
(e.g. If-then rules)
But in many cases the knowledge learnt by an algorithm
cannot be easily understood (instance-based learning, neural
networks)
ML for language processing
Has been widely employed in a large number of NLP
applications which range from part-of-speech tagging and
syntactic parsing to word-sense disambiguation and
coreference resolution.
In NLP both symbolic methods (e.g. decision trees,
instance-based classiers) and numerically oriented statistical
and neural-network training approaches were used
ML as classication task
Very often an NLP problem can be seen as a classication problem
POS: nding the appropriate class of a word
Segmentation (e.g. noun phrase extraction): each word is
classied as the beginning, end or inside of the segment
Anaphora/coreference resolution: classify candidates in
antecedent/non-antecedent
Summarisation as a classication task
Each example (instance) in the set to be learnt can be
described by a set of features f
1
, f
2
, ...f
n
The task is to nd a way to assign an instance to one of the
m disjoint classes c
1
, c
2
, ..., c
m
The automatic summarisation process is usually transformed
in a classication one
The features are dierent properties of sentences (e.g.
position, keywords, etc.)
Two classes: extract/do-not-extract
Not always classication. It is possible to use the score or
automatically learnt rules as well
Kupiec et. al. (1995)
used a Bayesian classier to combine dierent features
the features were:
if the length of a sentence is above a threshold (true/false)
contains cue words (true/false)
position in the paragraph (initial/middle/nal)
contains keywords (true/false)
contains capitalised words (true/false)
the training and testing corpus consisted of 188 documents
with summaries
humans identied sentences from the full text which are used
in the summary
the best combination was position + cue + length
Teu and Moens (1997) used a similar method for sentence
extraction
Mani and Bloedorn (1998)
learn rules about how to classify sentences
features used:
location features: location of sentence in paragraph, sentence
in special section, etc.
thematic features: tf score, tf*idf score, number of section
heading words
cohesion features: number of sentences with a synonym link to
sentence
user focused features: number of terms relevant to the topic
Example of rule learnt: IF sentence in conclusion & tf*idf high
& compression = 20% THEN summary sentence
Other ML methods
Osborne (2002) used maximum entropy with features such as
word pairs, sentence length, sentence position, discourse
features (e.g., whether sentence follows the Introduction,
etc.)
Knight and Marcu(2000) use noisy channel for sentence
compression
Conroy et. al. (2001) use HMM
Most of the methods these days try to use machine learning
Methods which exploit the discourse
structure
Methods which exploit discourse cohesion
summarisation methods which use discourse structure usually
produce better quality summaries because they consider the
relations between the extracted chunks
they rely on global discourse structure
they are more dicult to implement because very often the
theories on which they are based are dicult and not fully
understood
there are methods which use text cohesion and text coherence
very often it is dicult to control the length of summaries
produced in this way
Methods which exploit text cohesion
text cohesion involves relations between words, word senses,
referring expressions which determine how tightly connected
the text is
(S13) All we want is justice in our own country, aboriginal
activist Charles Perkins told Tuesdays rally. ... (S14) We
dont want budget cuts - its hard enough as it is , said
Perkins
there are methods which exploit lexical chains and
coreferential chains
Methods which exploit text cohesion
text cohesion involves relations between words, word senses,
referring expressions which determine how tightly connected
the text is
(S13) All we want is justice in our own country, aboriginal
activist Charles Perkins told Tuesdays rally. ... (S14) We
dont want budget cuts - its hard enough as it is , said
Perkins
there are methods which exploit lexical chains and
coreferential chains
Lexical chains for text summarisation
Telepattan system: Bembrahim and Ahmad (1995)
two sentences are linked if the words are related by repetition,
synonymy, class/superclass, paraphrase
sentences which have a number of links above a threshold
form a bond
on the basis of bonds a sentence has to previous and following
sentences it is possible to classify them as start topic, end
topic and mid topic
sentences are extracted on the basis of open-continue-end
topic
Barzilay and Elhadad (1997) implemented a more rened
version of the algorithm which includes ambiguity resolution
Using coreferential chains for text
summarisation
method presented in (Azzam, Humphreys, Gaizauskas, 1999)
the underlying idea is that it is possible to capture the most
important topic of a document by using a principal
coreferential chain
The LaSIE system was used to produce the coreferential
chains extended with a focus-based algorithm for resolution of
pronominal anaphora
Coreference chain selection
The summarisation module implements several selection criteria:
Length of chain: prefers a chain which contains most entires
which represents the most mentioned instance in a text
Spread of the chain: the distance between the earliest and the
latest entry in each chain
Start of Chain: the chain which starts in the title or in the
rst paragraph of the text (this criteria could be very useful
for some genres such as newswire)
Summarisation methods which use
rhetorical structure of texts
it is based on the Rhetorical Structure Theory (RST) (Mann
and Thompson, 1988)
according to this theory text is organised in non-overlapping
spans which are linked by rhetorical relations and can be
organised in a tree structure
there are two types of spans: nuclei and satellites
a nucleus can be understood without satellites, but not the
other way around
satellites can be removed in order to obtain a summary
the most dicult part is to build the rhetorical structure of a
text
Ono, Sumita and Miike (1994), Marcu (1997) and
Corston-Oliver (1998) present summarisation methods which
use the rhetorical structure of the text
from (Marcu, 2000)
Summarisation using argumentative
zoning
Teufel and Moens (2002) exploit the structure of scientic
documents in order to produce summaries
the summarisation process is split into two parts
1 identication of important sentences using an approach similar
to the one proposed by Kupiec, Pederson, and Chen (1995)
2 recognition of the rhetorical roles of the extracted sentences
for rhetorical roles the following classes are used: Aim,
Textual, Own, Background, Contrast, Basis, Other
Knowledge rich methods
Produce abstracts
Most of them try to understand (at least partially a text)
and to make inferences before generating the summary
The systems do not really understand the contents of the
documents, but they are using dierent techniques to extract
the meaning
Since this process involves a huge amount of world knowledge
the application is restricted to a specic domain only
The abstracts obtained in this way are betters in terms of
cohesion and coherence
The abstracts produced in this way tend to be more
informative
This method is also known as the understand and generate
approach
This method extracts the information from the text and holds
it in some intermediate form
The representation is then used as the input for a natural
language generator to produce an abstract
FRUMP (deJong, 1982)
uses sketchy scripts to understand a situation
these scripts only keep the information relevant to the event
and discard the rest
50 scripts were manually created
words from the source activate scripts and heuristics are used
to decide which script is used in case more than one script is
activated
Example of script used by FRUMP
1 The demonstrators arrive at the demonstration location
2 The demonstrators march
3 The police arrive on the scene
4 The demonstrators communicate with the target of the
demonstration
5 The demonstrators attack the target of the demonstration
6 The demonstrators attack the police
7 The police attack the demonstrators
8 The police arrest the demonstrators
demonstration
demonstration
demonstration
demonstration
demonstration
demonstration
demonstration
FRUMP
the evaluation of the system revealed that it could not process
a large number of scripts because it did not have the
appropriate scripts
the system is very dicult to be ported to a dierent domain
sometimes it can misunderstand some scripts: Vatican City.
The dead of the Pope shakes the world. He passed away
Earthquake in the Vatican. One dead.
the advantage of this method is that the output can be in any
language
Concept-based abstracting (Paice and
Jones, 1993)
Also referred to as extract and generate
Summaries in the eld of agriculture
Relies on predened text patterns such as this paper studies
the eect of [AGENT] on the [HLP] of [SPECIES] This
paper studies the eect of G. pallida on the yield of potato.
The summarisation process involves instantiation of patterns
with concepts from the source
Each pattern has a weight with is used to decide whether the
generated sentence is included in the output
This method is good to produce informative summaries
Other knowledge-rich methods
Rumelhart (1975) developed a system to understand and
summarise simple stories, using a grammar which generated
semantic interpretations of the story on the basis of
hand-coded rules.
Alterman (1986) used local understanding
Fum, Guida, and Tasso (1985) tries to replicate the human
summarisation process
Rau, Jacobs, and Zernik (1989) integrates a bottom-up
linguistic analyser and a top-down conceptual interpretation
Multi-document summarisation
methods
Multi-document summarisation
multi-document summarisation is the extension of
single-document summarisation to collections of related
documents
very rarely methods from single-document summarisation can
be directly used
it is not possible to produce single-document summaries from
every single document in collection and then to concatenate
them
normally they are user-focused summaries
Issues with multi-document summaries
the collections to be summarised can vary a lot in size, so
dierent methods might need to be used
a much higher compression rate is needed
redundancy
ordering of sentences (usually the date of publication is used)
similarities and dierences between dierent texts need to be
considered
contradiction between information
fragmentary information
IR inspired methods
Salton et. al. (1997) can be adapted to multi-document
summarisation
instead of using paragraphs from one documents, paragraphs
from all the documents are used
the extraction strategies are kept
Maximal Marginal Relevance
proposed by (Goldstein et al., 2000)
addresses the redundancy among multiple documents
allows a balance between the diversity of the information and
relevance to a user query
MMR(Q, R, S) =
argmax
D
i
R\S
[Sim
1
(D
i
, Q) (1 )max
D
j
R
Sim
2
(D
i
, D
j
))]
can be used also for single document summarisation
Cohesion text maps
use knowledge based on lexical cohesion Mani and Bloedorn
(1999)
good to compare pairs of documents and tell whats common,
whats dierent
builds a graph from the texts: the nodes of the graph are the
words of the text. Arcs represent adjacency, grammatical,
co-reference, and lexical similarity-based relations.
sentences are scored using tf.idf metric.
user query is used to traverse the graph (a spread activation is
used)
to minimize redundancy in extracts, extraction can be greedy
to cover as many dierent terms as possible
Cohesion text maps
Theme fusion Barzilay et. al. (1999)
used to avoid redundancy in multi-document summaries
Theme = collection of similar sentences drawn from one or
more related documents
Computes theme intersection: phrases which are common to
all sentences in a theme
paraphrasing rules are used (active vs. passive, dierent orders
of adjuncts, classier vs. apposition, ignoring certain
premodiers in NPs, synonymy)
generation is used to put the theme intersection together
Centroid based summarisation
a centroid = a set of words that are statistically important to
a cluster of documents
each document is represented as a weighted vector of TF*IDF
scores
each sentence receives a score equal with the sum of
individual centroid values
sentence salience Boguraev and Kennedy (1999)
centroid score Radev, Jing, and Budzikowska (2000)
Cross Structure Theory
Cross Structure Theory provides a theoretical model for issues
that arise when trying to summarise multiple texts (Radev,
Otterbacher, and Zhang, 2004).
describing relationships between two or more sentences from
dierent source documents related to the same topic.
similar to RST but at cross-document level
18 domain-independent relations such as identity, equivalence,
subsumption, contradiction, overlap, fullment and
elaboration between texts spans
can be used to extract sentences and avoid redundancy
Automatic summarisation and the
Internet
New research topics have emerged at the conuence of
summarisation with other disciplines (e.g. question answering
and opinion mining)
Many of these elds appeared as a result of the expansion of
the Internet
The Internet is probably the largest source of information, but
it is largely unstructured and heterogeneous
Multi-document summarisation is more necessary than ever
Web content mining = extraction of useful information from
the Web
Challenges posed by the Web
Huge amount of information
Wide and diverse
Information of all types e.g. structured data, texts, videos, etc.
Semi-structured
Linked
Redundant
Noisy
Summarisation of news on the Web
Newsblaster (McKeown et. al. 2002) summarises news from
the Web (http://newsblaster.cs.columbia.edu/)
it is mainly statistical, but with symbolic elements
it crawls the Web to identify stories (e.g. lters out ads),
clusters them on specic topics and produces a
multidocument summary
theme sentences are analysed and fused together to produce
the summary
summaries also contain images using high precision rules
similar services: newsinessence, Google News, News Explorer
tracking and updating are important features of such systems
Email summarisation
email summarisation is more dicult because they have a
dialogue structure
Muresan et. al. (2001) use machine learning to learn rules for
salient NP extraction
Nenkova and Bagga (2003) use developed a set of rules to
extract important sentences
Newman and Blitzer (2003) use clustering to group messages
together and then they extract a summary from each cluster
Rambow et. al. (2004) automatically learn rules to extract
sentences from emails
these methods do not use may email specic features, but in
general the subject of the rst email is used as a query
Blog summarisation
Zhou et. al. (2006) see a blog entry as a summary of a news
stories with personal opinions added. They produce a
summary by deleting sentences not related to the story
Hu et. al. (2007) use blogs comments to identify words that
can be used to extract sentences from blogs
Conrad et. al. (2009) developed a query-based opinion
summarisation for legal blog entries based on the TAC 2008
system
Opinion mining and summarisation
nd what reviewers liked and disliked about a product
usually large number of reviews, so an opinion summary
should be produced
visualisation of the result is important and it may not be a text
analogous to, but dierent to multi-document summarisation
Producing the opinion summary
A three stage process:
1 Extract object features that have been commented on in each
review.
2 Classify each opinion
3 Group feature synonym and produce the summary (pro vs.
cons, detailed review, graphical representation)
Opinion summaries
Mao and Lebanon (2007) suggest to produce summaries that
track the sentiment ow within a document i.e., how
sentiment orientation changes from one sentence to the next
Pang and Lee (2008) suggest to create subjectivity extracts.
sometimes graph-based output seems much more appropriate
or useful than text-based output
in traditional summarization redundant information is often
discarded, in opinion summarization one wants to track and
report the degree of redundancy, since in the opinion-oriented
setting the user is typically interested in the (relative) number
of times a given sentiment is expressed in the corpus.
there is much more contradictory information
Opinion summarisation at TAC
the Text Analysis Conference 2008 (TAC) contained an
opinion summarisation from blogs
http://www.nist.gov/tac/
generate summaries of opinions about targets
What features do people dislike about Vista?
a question answering system is used to extract snippets that
are passed to the summariser
QA and Summarisation at INEX2009
the QA track at INEX2009 requires participants to answer
factual and complex questions
the complex questions will require to aggregate the answer
from several documents
What are the main applications of bayesian networks in the
eld of bioinformatics?
for complex sentences evaluators will mark syntactic
incoherence, unresolved anaphora, redundancy and not
answering the question
Wikipedia will be used as document collection
Conclusions
research in automatic summarisation is still a very active, but
in many cases it merges with other elds
evaluation is still a problem in summarisation
the current state-of-the-art is still sentence extraction
more language understanding needs to be added to the
systems
Thank you!
More information and updates at:
http://www.summarizationonline.info
References
Alterman, Richard. 1986. Summarisation in small. In N. Sharkey, editor, Advances in
cognitive science. Chichester, England, Ellis Horwood.
American National Standards Institute Inc. 1979. American National Standard for
Writing Abstracts. Technical Report ANSI Z39.14 1979, American National
Standards Institute, New York.
Baxendale, Phyllis B. 1958. Man-made index for technical literature - an experiment.
I.B.M. Journal of Research and Development, 2(4):354 361.
Boguraev, Branimir and Christopher Kennedy. 1999. Salience-based content
characterisation of text documents. In Inderjeet Mani and Mark T. Maybury, editors,
Advances in Automated Text Summarization. The MIT Press, pages 99 110.
Borko, Harold and Charles L. Bernier. 1975. Abstracting concepts and methods.
Academic Press, London.
Brandow, Ronald, Karl Mitze, and Lisa F. Rau. 1995. Automatic condensation of
electronic publications by sentence selection. Information Processing & Management,
31(5):675 685.
Cleveland, Donald B. 1983. Introduction to Indexing and Abstracting. Libraries
Unlimited, Inc.
Conroy, James M., Jjudith D. Schlesinger, Dianne P. OLeary, and Mary E. Okurowski.
2001. Using HMM and logistic regression to generate extract summaries for DUC. In
Proceedings of the 1st Document Understanding Conference, New Orleans, Louisiana
USA, September 13-14.
DeJong, G. 1982. An overview of the FRUMP system. In W. G. Lehnert and M. H.
Ringle, editors, Strategies for natural language processing. Hillsdale, NJ: Lawrence
Erlbaum, pages 149 176.
Edmundson, H. P. 1969. New methods in automatic extracting. Journal of the
Association for Computing Machinery, 16(2):264 285, April.
Endres-Niggemeyer, Brigitte. 1998. Summarizing information. Springer.
Fukusima, Takahiro and Manabu Okumura. 2001. Text Summarization Challenge
Text summarization evaluation in Japan (TSC). In Proceedings of Automatic
Summarization Workshop.
Fum, Danilo, Giovanni Guida, and Carlo Tasso. 1985. Evaluating importance: a step
towards text summarisation. In Proceedings of the 9th International Joint Conference
on Articial Intelligence, pages 840 844, Los Altos CA, August.
Goldstein, Jade, Mark Kantrowitz, Vibhu Mittal, and Jaime Carbonell. 1999.
Summarizing text documents: Sentence selection and evaluation metrics. In
Proceedings of the 22nd Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval, pages 121 128, Berkeley, California,
August, 15 19.
Goldstein, Jade, Vibhu O. Mittal, Jamie Carbonell, and Mark Kantrowitz. 2000.
Multi-Document Summarization by Sentence Extraction. In Udo Hahn, Chin-Yew Lin,
Inderjeet Mani, and Dragomir R. Radev, editors, Proceedings of the Workshop on
Automatic Summarization at the 6th Applied Natural Language Processing
Conference and the 1st Conference of the North American Chapter of the Association
for Computational Linguistics, Seattle, WA, April.
Graetz, Naomi. 1985. Teaching EFL students to extract structural information from
abstracts. In J. M. Ulign and A. K. Pugh, editors, Reading for Professional Purposes:
Methods and Materials in Teaching Languages. Leuven: Acco, pages 123135.
Hasler, Laura, Constantin Orasan, and Ruslan Mitkov. 2003. Building better corpora
for summarisation. In Proceedings of Corpus Linguistics 2003, pages 309 319,
Lancaster, UK, March, 28 31.
Hovy, Eduard. 2003. Text summarisation. In Ruslan Mitkov, editor, The Oxford
Handbook of computational linguistics. Oxford University Press, pages 583 598.
Jing, Hongyan and Kathleen R. McKeown. 1999. The decomposition of
human-written summary sentences. In Proceedings of the 22nd International
Conference on Research and Development in Information Retrieval (SIGIR99), pages
129 136, University of Berkeley, CA, August.
Johnson, Frances. 1995. Automatic abstracting research. Library review, 44(8):28
36.
Knight, Kevin and Daniel Marcu. 2000. Statistics-based summarization step one:
Sentence compression. In Proceedings of the 17th National Conference on Articial
Intelligence (AAAI), pages 703 710, Austin, Texas, USA, July 30 August 3.
Kolcz, Aleksander, Vidya Prabakarmurthi, and Jugal Kalita. 2001. Summarization as
feature selection for text categorization. In Proceedings of the 10th International
Conference on Information and Knowledge Management, pages 365 370, Atlanta,
Georgia, US, October 05 - 10.
Kuo, June-Jei, Hung-Chia Wung, Chuan-Jie Lin, and Hsin-Hsi Chen. 2002.
Multi-document summarization using informative words and its evaluation with a QA
system. In Proceedings of the Third International Conference on Intelligent Text
Processing and Computational Linguistics (CICLing-2002), pages 391 401, Mexico
City, Mexico, February, 17 23.
Kupiec, Julian, Jan Pederson, and Francine Chen. 1995. A trainable document
summarizer. In Proceedings of the 18th ACM/SIGIR Annual Conference on Research
and Development in Information Retrieval, pages 68 73, Seattle, July 09 13.
Lin, Chin-Yew. 2004. Rouge: a package for automatic evaluation of summaries. In
Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004),
Barcelona, Spain, July 25 - 26.
Lin, Chin-Yew and Eduard Hovy. 1997. Identifying topic by position. In Proceedings
of the 5th Conference on Applied Natural Language Processing, pages 283 290,
Washington, DC, March 31 April 3.
Louis, Annie and Ani Nenkova. 2009. Performance condence estimation for
automatic summarization. In Proceedings of the 12th Conference of the European
Chapter of the ACL, page 541548, Athens, Greece, March 30 - April 3.
Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal of
research and development, 2(2):159 165.
Mani, Inderjeet and Eric Bloedorn. 1998. Machine learning of generic and
user-focused summarization. In Proceedings of the Fifthteen National Conference on
Articial Intelligence, pages 821 826, Madison, Wisconsin. MIT Press.
Mani, Inderjeet and Eric Bloedorn. 1999. Summarizing similarities and dierences
among related documents. In Inderjeet Mani and Mark T. Maybury, editors, Advances
in automatic text summarization. The MIT Press, chapter 23, pages 357 379.
Mani, Inderjeet, Therese Firmin, David House, Michael Chrzanowski, Gary Klein,
Lynette Hirshman, Beth Sundheim, and Leo Obrst. 1998. The TIPSTER SUMMAC
text summarisation evaluation: Final report. Technical Report MTR 98W0000138,
The MITRE Corporation.
Mani, Inderjeet and Mark T. Maybury, editors. 1999. Advances in automatic text
summarisation. MIT Press.
Marcu, Daniel. 1999. The automatic construction of large-scale corpora for
summarization research. In The 22nd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR99), pages 137144,
Berkeley, CA, August 15 19.
Marcu, Daniel. 2000. The theory and practice of discourse parsing and summarisation.
The MIT Press.
Miike, Seiji, Etsuo Itoh, Kenji Ono, and Kazuo Sumita. 1994. A full-text retrieval
system with a dynamic abstract generation function. In Proceedings of the 17th ACM
SIGIR conference, pages 152 161, Dublin, Ireland, 3-6 July. ACM/Springer.
Minel, Jean-Luc, Sylvaine Nugier, and Gerald Piat. 1997. How to appreciate the
quality of automatic text summarization? In Proceedings of the ACL97/EACL97
Workshop on Intelligent Scallable Text Summarization, pages 25 30, Madrid, Spain,
July 11.
Morris, Andrew H., George M. Kasper, and Dennis A. Adams. 1992. The eect and
limitations of automatic text condensing on reading comprehension performance.
Information Systems Research, 3(1):17 35.
Oka, Mamiko and Yoshihiro Ueda. 2000. Evaluation of phrase-representation
summarization based on information retrieval task. In NAACL-ANLP 2000 Workshop
on Automatic Summarization, pages 59 68, Seattle, Washington, April 30.
Okumura, Manabu, Takahiro Fukusima, and Hidetsugu Nanba. 2003. Text
Summarization Challenge 2: Text Summarization Evaluation at NTCIR Workshop 3.
In Proceeding of the HLT-NAACL 2003 Workshop on Text summarization, pages 49
56, Edmonton, Alberta, Canada, May 31 June 1.
Orasan, Constantin. 2009. Comparative evaluation of term-weighting methods for
automatic summarization. Journal of Quantitative Linguistics, 16(1):67 95.
Orasan, Constantin, Viktor Pekar, and Laura Hasler. 2004. A comparison of
summarisation methods based on term specicity estimation. In Proceedings of the
Fourth International Conference on Language Resources and Evaluation (LREC2004),
pages 1037 1041, Lisbon, Portugal, May 26 28.
Osborne, M. 2002. Using maximum entropy for sentence extraction. In Proceedings
of ACL 2002 Workshop on Automatic Summarization.
Paice, Chris D. 1981. The automatic generation of literature abstracts: an approach
based on the identication of self-indicating phrases. In R. N. Oddy, C. J. Rijsbergen,
and P. W. Williams, editors, Information Retrieval Research. London: Butterworths,
pages 172 191.
Papineni, K., S. Roukos, T. Ward, and W. J. Zhu. 2002. BLEU: a method for
automatic evaluation of machine translation. In Proceedings of the 40th Annual
meeting of the Association for Computational Linguistics (ACL 2002), pages 311
318.
Radev, Dragomir, Jahna Otterbacher, and Zhu Zhang. 2004. CSTBank: A Corpus for
the Study of Cross-document Structural Relationship. In Proceedings of Language
Resources and Evaluation Conference (LREC 2004), Lisbon, Portugal.
Radev, Dragomir R., Hongyan Jing, and Malgorzata Budzikowska. 2000.
Centroid-based summarization of multiple documents: sentence extraction,
utility-based evaluation and user studies. In Proceedings of the NAACL/ANLP
Workshop on Automatic Summarization, pages 21 29, Seattle, WA, USA, 30 April.
Rau, Lisa F., Paul S. Jacobs, and Uri Zernik. 1989. Information extraction and text
summarisation using linguistic knowledge acquisition. Information Processing &
Management, 25(4):419 428.
Rumelhart, E. 1975. Notes on a schema for stories. In D. G. Bobrow and A. Collins,
editors, Representation and Understanding: Studies in Cognitive Science. Academic
Press Inc, pages 211 236.
Saggion, Horacio and Guy Lapalme. 2000. Concept identication and presentation in
the context of technical text summarization. In NAACL-ANLP 2000 Workshop on
Automatic Summarization, pages 1 10, Seattle, Washington, April 30.
Salton, Gerard, Amit Singhal, Mandar Mitra, and Chris Buckley. 1997. Automatic
text structuring and summarization. Information Processing and Management,
33(3):193 207.
Sparck Jones, Karen. 1999. Automatic summarizing: factors and directions. In
Inderjeet Mani and Mark T. Maybury, editors, Advances in automatic text
summarization. The MIT Press, chapter 1, pages 1 12.
Sparck Jones, Karen. 2001. Factorial summary evaluation. In Proceedings of the
Workshop on Text Summarization (DUC 2001), New Orleans, Louisiana, USA,
September 13-14.
Sparck Jones, Karen. 2007. Automatic summarising: The state of the art.
Information Processing and Management, 43:1449 1481.
Teufel, Simone and Marc Moens. 1997. Sentence extraction as a classication task.
In Proceedings of the ACL97/EACL97 Workshop on Intelligent Scallable Text
Summarization, pages 58 59, Madrid, Spain, July 11.
Teufel, Simone and Marc Moens. 2002. Summarizing scientic articles: Experiments
with relevance and rhetorical status. Computational linguistics, 28(4):409 445.
van Dijk, Teun A. 1980. Text and context : explorations in the semantics and
pragmatics of discourse. London : Longman.

Automatic Summarization

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Automatic Summarization

Încărcat de

Drepturi de autor:

Formate disponibile

Automatic summarisation in the Information Age

S-ar putea să vă placă și