Documente Academic
Documente Profesional
Documente Cultură
PRESENTED BY:
HRISHIKESH B
S7 CSE ALPHA
Univ reg:11012288
in the other.
Assign to every pair of sentences (S, T) a probability, Pr(T|S), to be
Pr(S|T) = Pr(T|S)Pr(S)/Pr(T)
= Pr(T|S)Pr(S) [The denominator on the right of this equation does not depend
on S, and so it suffices to choose the S that maximizes the product Pr(S)Pr(TIS).]
Pr(T|S) is called the translation model (TM).
Pr(S) is called the language model (LM).
The LM should assign probability to sentences which are good English.
Why not calculate Pr(S|T) directly ,rather than break Pr(S|T) into two
likely in language S.
Pr(S|T) worries about match with words i.e match of English word with
French.
The two can be trained independently
Whats P(s)?
P(STRING s1s2s3.sn)
P(s1, s2, s3 si)
Using the chain rule
Pr (s1s2 ...sn)
= Pr (s1) Pr (s2 ls1) ... Pr (sn |s1s2 ...Sn_,)
P( s1 ) P( s2 | s1 ) P( s3 | s1 , s2 ) P( s4 | s1 , s2 , s3 ) P( si | s1 , s2 , si 1 )
Because there are so many histories, we cannot simply treat
sequence wn1wn. The denominator on the right hand side sums over all
word w in the corpus the number of times wn1 occurs before any
word.
Since this is just the count of wn1, we can write the above equation as,
P(wn|wn1) =count(wn1wn)/count(wn1)
For example, to calculate the probability of the sentence, all men are equal, we split
it up as,
P(all men are equal) = P(all|start)P(men|all)P(are|men)P(equal|are)
where start denotes the start of the sentence, and P(all|start) is the probability
that a sentence starts
with the word all.
Given the bigram probabilities in table,
Bigram
start all
all men
men are
are equal
Probability
0.16
0.09
0.24
0.08
P(T | S ) P( S )
P( S | T )
P(T | S ) P( S )
P(T )
English sentence as being generated from the English sentence word by word.
Thus, in the sentence pair (Jean aime Marie | John loves Mary) we feel that John
produces Jean, loves produces aime, and Mary produces Marie. We say that a
word is aligned with the word that it produces.
Not all pairs of sentences are as simple as this example. In the pair (Jean n'aime
personne | john loves nobody), we can again align John with Jean and loves with
aime, but now, nobody aligns with both n' and personne.
Sometimes, words in the English sentence of the pair align with nothing in the
French sentence, and similarly, occasionally words in the French member of the
pair do not appear to go with any of the words in the English sentence.
words
not necessarily 1-to-1
Example
S = w1 w2 w3 w4 w5 w6 w7
T = u 1 u2 u3 u4 u 5 u6 u7 u8 u9
w4 -> u3 u5
fertility of w4 = 2
distortion w5 -> u9
P(i|j,l) = probability target word is at position i given source word at position j and l
Idea (Search):
construct best S incrementally
start with a highly likely word transfer
and find a valid alignment
extending candidate S at each step
(Jean aime Marie | * )
(Jean aime Marie | John(1) * )
Failure?
best S not a good translation
language model failed or
translation model failed
couldnt find best S
search failure
Parameter Estimation
English/French
from the Hansard corpus
100 million words
bilingual Canadian parliamentary proceedings
unaligned corpus
Language Model
P(S) from bigram model
Translation Model
how to estimate this with an unaligned corpus?
Used EM (Estimation and Maximization) algorithm, an iterative algorithm for re-estimating probabilities
Need
P(u|w) for words u in T and w in S
P(n|w) for fertility n and w in S
P(i|j,l) for target position i and source position j and target length l
sentence-aligned parallel corpus. For the translation model to work well, the corpus has to
be large enough that the model can derive reliable probabilities from it, and representative
enough of the domain or sub-domain (weather forecasts, match reports, etc.) it is intended
to work for.
Statistical MT techniques have not so far been widely explored for Indian languages. It
would be interesting to find out to what extent these models can contribute to the huge
ongoing MT efforts in the country.
[1] Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, John D.
Lafferty, Robert L. Mercer, and Paul S. Roossin, A Statistical Approach to Machine Translation,
Computational Linguistics, 16(2), pages 7985, June 1990.
[2] Weaver, W. 1955 Translation (1949). In: Machine Translation of Languages, MIT Press, Cambridge,
MA.
[3] Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, The
Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 19(2),
pages 263311, June 1993.
[4] Masahiko Haruno and Takefumi Yamazaki, High- Performance Bilingual Text Alignment using Statistical
and Dictionary Information, Proceedings of the 34th Conference of the Association for Computational
Linguistics, pages 131138, 1996.
[5] John Hutchins and Harold L. Somers, An Introduction to Machine Translation, Academic Press, 1992.
[6] Lopez, A. 2008. Statistical machine translation. ACM Comput. Surv., 40, 3, Article 8 (August 2008).
[7] W. A. Gale and K. W. Church, A Program for Aligning Sentences in Bilingual Corpora, Computational
Linguistics, 19(1), pages 75102,1993.