Sunteți pe pagina 1din 152

c Copyright 2010

Jeremy G. Kahn

Parse decoration of the word sequence in the speech-to-text machine-translation pipeline

Jeremy G. Kahn

A dissertation submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

University of Washington

2010

Program Authorized to Offer Degree: Linguistics

University of Washington Graduate School

This is to certify that I have examined this copy of a doctoral dissertation by

Jeremy G. Kahn

and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made.

Chair of the Supervisory Committee:

Reading Committee:

Mari Ostendorf

Mari Ostendorf

Paul Aoki

Emily M. Bender

Date:

Fei Xia

In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of this dissertation may be referred to Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346, 1-800-521-0600, to whom the author has granted “the right to reproduce and sell (a) copies of the manuscript in microform and/or (b) printed copies of the manuscript made from microform.”

Signature

Date

University of Washington

Abstract

Parse decoration of the word sequence in the speech-to-text machine-translation pipeline

Jeremy G. Kahn

Chair of the Supervisory Committee:

Professor Mari Ostendorf Electrical Engineering & Linguistics

Parsing, or the extraction of syntactic structure from text, is appealing to natural lan-

guage processing (NLP) engineers and researchers. Parsing provides an opportunity to

consider information about word sequence and relatedness beyond simple adjacency. This

dissertation uses automatically-derived syntactic structure (parse decoration) to improve

the performance and evaluation of large-scale NLP systems that have (in general) used

only word-sequence level measures to quantify success. In particular, this work focuses on

parse structure in the context of large-vocabulary automatic speech recognition (ASR) and

statistical machine translation (SMT) in English and (in translation) Mandarin Chinese.

The research here explores three characteristics of statistical syntactic parsing: dependency

structure, constituent structure, and parse-uncertainty — making use of the parser’s ability

to generate an M -best list of parse hypotheses.

Parse structure predictions are applied to ASR to improve word-error rate over a baseline

non-syntactic (sequence-only) language model (achieving 6–13% of possible error reduction).

Critical to this success is the joint reranking of an N ×M -best list of N ASR hypothesis tran-

scripts and M -best parse hypotheses (for each transcript). Jointly reranking the N ×M lists

is also demonstrated to be useful in choosing a high-quality parse from these transcriptions.

In SMT, this work demonstrates expected dependency pair match (EDPM), a new mech-

anism for evaluating the quality of SMT translation hypotheses by comparing them to refer-

ence translations. EDPM, which makes direct use of parse dependency structure directly in its measurement, is demonstrated to be superior in correlation with human measurements of translation quality to the competitor (and widely-used) evaluation metrics BLEU 4 and translation edit rate. Finally, this work explores how syntactic constituents may predict or improve the behav- ior of unsupervised word-aligners, a core component of SMT systems, over a collection of Chinese-English parallel text with reference alignment labels. Statistical word-alignment is improved over several machine-generated alignments by exploiting the coherence of certain parse constituent structures to identify source-language regions where a high-recall aligner may be trusted. These diverse results across ASR and SMT point together to the utility of including parse information into large-scale (and generally word-sequence oriented) NLP systems and demonstrate several approaches for doing so.

TABLE OF CONTENTS

Page

List of Figures

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

iii

List of Tables

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

v

Chapter 1:

Introduction .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1

1.1 Evaluating the word sequence

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

2

1.2 Using parse information within automatic language processing

.

.

.

.

.

.

.

.

4

1.3 Overview of this work

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

6

Chapter 2:

Background

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9

2.1 Statistical parsing

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9

2.2 Reranking n-best lists

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

14

2.3 Automatic speech recognition

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

17

2.4 Statistical machine translation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

21

Summary

2.5 .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

28

Chapter 3:

Parsing Speech

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

31

Background

3.1 .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

32

3.2 Architecture

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

36

3.3 Corpus and experimental setup

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

43

3.4 Results

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

49

3.5 Discussion .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

58

Chapter 4:

Using grammatical structure to evaluate machine translation

.

.

.

.

.

61

4.1 Background

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

62

4.2 Approach: the DPM family of metrics

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

63

4.3 Implementation of the DPM family

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

66

4.4 Selecting EDPM with human judgements of fluency & adequacy

.

.

.

.

.

.

.

68

4.5 Correlating EDPM with HTER

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

71

4.6 Combining syntax with edit and semantic knowledge sources

.

.

.

.

.

.

.

.

.

74

i

4.7

Discussion

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

76

Chapter 5:

Measuring coherence in word alignments for automatic statistical ma-

 

chine translation

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

79

5.1 Background

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

81

5.2 Coherence on bitext spans

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

82

5.3 Corpus .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

84

5.4 Analyzing span coherence among automatic word alignments

.

.

.

.

.

.

.

.

.

88

5.5 Selecting whole candidates with a reranker .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

95

5.6 Creating hybrid candidates by merging alignments

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 101

5.7 Discussion .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 105

Chapter 6:

Conclusion .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 107

6.1 Summary of key contributions

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 107

6.2 Future directions for these applications .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 109

6.3 Future challenges for parsing as a decoration on the word sequence

 

111

Bibliography

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 115

 

ii

LIST OF FIGURES

Figure Number

 

Page

2.1

A lexicalized phrase structure and the corresponding constituent and depen-

dency trees

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

11

2.2

The models that contribute to ASR.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

17

2.3

Word alignment between e and f

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

22

2.4

The models that make up statistical machine translation systems

 

24

3.1

A SParseval example .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

34

3.2

System architecture at test time.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

37

3.3

n-best resegmentation using confusion networks .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

38

3.4

Oracle parse performance contours for different numbers of parses M and

 

recognition hypotheses N on reference segmentations.

 

.

.

.

.

.

.

.

.

.

.

.

.

.

51

3.5

SParseval performance for different feature and optimization conditions as

 

a function of the size of the N-best list.

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

56

4.1

Example dependency trees and their dlh decompositions.

.

.

.

.

.

.

.

.

.

.

.

64

4.2

The dl and lh decompositions of the hypothesis tree in figure 4.1.

.

.

.

.

.

.

64

4.3

An example headed constituent tree and the labeled dependency tree derived

 

from it.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

67

4.4

Pearson’s r for various feature tunings, with 95% confidence intervals. EDPM, BLEU and TER correlations are provided for

76

5.1

A Chinese sentence and its translation, with reference alignments and align-

 

ments generated by unioned GIZA++

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

80

5.2

Examples of the four coherence classes

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

83

5.3

Decision trees for VP and IP spans.

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

93

5.4

An example incoherent

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

94

5.5

An example of clause-modifying adverb appearing inside a verb chain

.

.

.

.

96

5.6

An example of English ellipsis where Chinese repeats a word.

 

.

.

.

.

.

.

.

.

.

97

5.7

Example of an NP-guided union.

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 103

 

iii

LIST OF TABLES

Table Number

 

Page

1.1

Two ASR hypotheses with the same WER.

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3

1.2

Word-sequences not considered to match by na¨ıve word-sequence evaluation .

3

3.1

Reranker feature descriptions

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

40

3.2

Switchboard data partitions

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

44

3.3

Segmentation conditions

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

47

3.4

Baseline and oracle WER reranking performance from N = 50 word sequence

 

hypotheses and 1-best parse .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

50

3.5

Oracle SParseval (WER) reranking performance from N = 50 word se-

 

quence hypotheses and M = 1, 10, or 50 parses

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

51

3.6

Reranker feature combinations

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

52

3.7

WER on the evaluation set for different sentence segmentations and feature

 

sets.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

53

3.8

Word error rate results comparing γ

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

54

3.9

Results under different segmentation conditions when optimizing for SPar-

 

seval objective

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

55

4.1

Per-segment correlation with human fluency/adequacy judgements of differ- ent combination methods and

69

4.2

Per-segment correlation with human fluency/adequacy judgements of base- lines and different decompositions. N = 1 parses

70

4.3

Considering γ and N

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

71

4.4

Corpus statistics for the GALE 2.5 translation corpus.

.

.

.

.

.

.

.

.

.

.

.

.

.

72

4.5