Sunteți pe pagina 1din 7

October 2012, 19(Suppl.

2): 140146
www.sciencedirect.com/science/journal/10058885 http://jcupt.xsw.bupt.cn
The Journal of China
Universities of Posts and
Telecommunications
A Chinese-English patent machine translation system based on the
theory of hierarchical network of concepts
ZHU Yun (), JIN Yao-hong
CPIC-BNU Joint Laboratory of Machine Translation, Institute of Chinese Information Processing, Beijing Normal University, Beijing 100875, China
Abstract
Compared with ordinary text, patent text often has more complex sentence structure and more ambiguity of multiple
verbs. To deal with these problems, this paper presents a rule-based Chinese-English patent machine translation (MT)
system based on the theory of hierarchical network of concepts (HNC). In this system, the whole procedure are divided into
three main parts, the semantic analysis of the source language, the transitional transformation from the source language to
the target language and the generation of the target language. The knowledge base and the rule set are obtained from
manually analyzing the semantic features of a training set which contains more than 6 000 Chinese patent sentences, and a
specific method of evaluation is provided during the experiment.
Keywords patent machine translation, semantic analysis, semantic features, transitional transformation
1 Introduction

To facilitate worldwide communication and cooperation,
patent literature often needs to be translated into multiple
languages. However, the volume of patent application is
huge and it keeps growing, which brings increasingly
heavy pressure in translation. In response to this pressure,
the automatic translation of the patent literature has
become an important area of the application of MT. In
2008, the NTCIR-7 listed patent MT as one of the
evaluation issue, and since then, this task has been
performed repeatedly [1].
Patent language, as a combination of technical language
and legal language, has some distinctive features.
Generally, patent literature has longer sentences, tedious
and rigorous expressions, and fixed format. All these
features cause more difficulties in MT. Nowadays, many of
the existing patent MT systems are developed from
ordinary MT systems, and researchers have not found a
special approach to handle the difficulties of patent
translation. As a result, the performances are lower than

Received date: 05-08-2012
Corresponding author: ZHU Yun, E-mail: diana_zhupier@hotmail.com
DOI: 10.1016/S1005-8885(11)60430-5
ones of typical text translation.
Although the characteristics of patent text brought such
difficulties in translation, the unique way of expression can
provide a great deal of information for the semantic
analysis. This paper does not use statistical method. On the
contrary, we focus on patent text, and through studying
these features in expression and the rules in translating
patent from Chinese to English which we can obtain from
Chinese-English bilingual patent corpus, we presents a
rule-based system which has three main steps for
translating from source language into target language. The
first step is the semantic analysis of source language, and
after this step, the system can obtain the shallow parsing
tree. The second step is the transitional transformation
from the source language to the target language, and
during this step, positions of some nodes of the parsing
tree would be moved. The last step is the generation of the
target language, and in this step, the system would give
translation of every word based on its part-of-speech. The
system is design to work in this way in purpose to improve
the performance of the translation.
This paper is organized as follows. In Sect. 2 we review
the related works. Sect. 3 discusses the strategy of the
translation. Sect. 4 introduces the algorithm and procedure.
Supplement 2 ZHU Yun, et al. / A Chinese-English patent machine translation system based on the theory of 141
Sect. 5 shows the experimental result and analysis. And in
the last section, we present some conclusions.
2 Related work
At present, the MT systems always adopted one of
following methods, the rule-based method, the statistical
method, and the hybrid-strategy method. The rule-based
method has advantage that rules can describe language
phenomenon directly and accurately. However, rules
writing takes a lot of time and manpower, and the
completeness and the concord are difficult to get. On the
other hand, the statistical method is able to obtain result
quickly. But due to the sparse data and other reasons, it
lacks capacity to deal with some particular language
phenomenon. Also, the large scale bilingual corpus is not
easy to obtain [2]. Although it has these deficiency, the
statistical machine translation (SMT) method were widely
used. According to the data provided by organizers, there
were 12 groups using statistical method among 15 groups
in the patent translation task of NTCIR-7 workshop [1]. In
order to compensate for the shortcomings and the
deficiency of both methods, a hybrid-strategy method as a
combination of these two MT methods has become a new
trend.
For recognizing the main verb in Chinese sentence,
recent research has adopted ways as follows. Some
recognize the predicate head through constructing a
statistical decision tree mode [3]. Some recognize the
predicate head according to the predicate head of the
corresponding English sentence [4]. Some combine a
rule-based method with a multi-feature-based method [5].
Some obtain the head verb candidates through the
conceptual symbol of Chinese characters or words [6].
Some identify the predicate head based on not only the
static and dynamic grammatical features of the candidate
predicate heads, but also the syntactic relations between
the subject and the predicate [7].
For chunking and shallow parsing, support vector
machine (SVM) is applied to shallow parsing [8]. Some
presents a memory-based learning approach [9]. In
Ref. [10], the author applies the shallow parsing for
PortugueseSpanish MT.
To obtain the dependency tree from different language,
former researchers mainly adopted statistical method.
Some formalized weighted dependency parsing as
searching for maximum spanning trees (MSTs) in directed
graphs [11]. Some extract typed dependency parses of
English sentences from phrase structure parses [12]. In
Japanese, researchers analyze dependency structure based
on SVMs [13] and a cascaded chunking model [14]. Zhou
uses a hybrid model of phrase structure partial parsing and
dependency parsing [15].
The Chinese-English MT always needs structural
reordering. Researchers present a source-side reordering
method based on syntactic chunks for phrase-based
statistical MT [16]. Wang et al. proposed a reordering
method based on a set of syntactic reordering rules [17].
Our research is based on the HNC theory [18]. This
theory designed the language concepts space as a system
with four level digital symbols, and each level has their
symbolic expressions. The theory proposed a method for
semantically analyzing Chinese texts grounded in the
characteristics of Chinese expression, which provides us
both the telescope and the microscope for observing the
natural language.
3 The strategy of translation based on HNC theory
In this section, we would like to present the main task of
every step and some semantic features used in our system.
Note that, HNC theory is not only a sound theory
framework to analyze the problem. More importantly, it
also provides a unified symbol representation of possible
rules to solve our problem, which makes the software
implementation of the rule-based natural language
understanding (NLP) problem readily feasible. In the
following examples, some important HNC principles are
applied to form the rules. The main HNC principle used in
this paper is the lv principle, one of the strategies for
sentence analysis based on the HNC theory [1920]. In
this principle, l means logic concepts, which can be
divided into many types, and all these types provide
different kinds of information in different phases in
recognition; v means verbs, and they themselves also have
lots of semantic features.
We define three kinds of chunks, the entity chunk, the
adverbial chunk and the predicate, to represent a sentence
in the shallow parsing tree. The entity chunk includes the
subject, the object and the object complement. The
adverbial chunk modifies the predicate, and it may
describe the time, the location, the way, etc. of the
happening action.
142 The Journal of China Universities of Posts and Telecommunications 2012
3.1 Semantic features
There are several lv concepts and their semantic features
used in our system, the most important of which are
introduced in this section.
1) lb
There is a certain kind of logic concept that indicates the
small sentence (SS) separation. In our system, we call this
kind of logic concept as lb. It is more like the conjunctions
that connect sentences or clauses.
2) l0
A predicate often has a subject and an object. Usually,
the sequence of a sentence would be subject+predicate+
object. Under the circumstances, the entity chunks are
separated by the predicate. There is a kind of logic concept
l0, which locates between two entity chunks. Also, to
change sentence sequence, sometimes this kind of
concepts is required, like object+l0+subject+predicate (in
passive voice). The logic concepts l0 can be divided into
two groups. One is followed by subject and the other one
is followed by object.
3) l1 & l1h
The adverbial chunk has its signal logic concepts. The
logic concept l1 always shows at the beginning of an
adverbial chunk. Sometimes, at the end of the adverbial
chunk, there is a signal logic concept to indicate the
boundary. We call this kind of logic concepts as l1h.
4) Verb
Verbs have particular semantic features. Some verbs can
be followed by a clause as the object. Some verbs need to
change into passive voice when translate into English.
Some verbs could have only two entity chunks, but some
verbs can have three. Each verb is given a score based on
its own semantic features and the logic concepts it related
to.
3.2 Semantic analysis
In this step, the shallow parsing tree is obtained. This
semantic analysis is based on the principle of boundary
perception. The logic concepts and the verbs are used to
divide a sentence into a several chunks.
There are two key problems in Chinese sentence
analysis. The first one is that there may be several
predicates in one Chinese sentence. As a result, one
Chinese sentence should be separated into several parts
according to the number of predicates. Each part is called
as an SS which can format an independent syntactic tree.
Generally, an SS may composed by several words and only
one of these punctuation marks (we call as stop
punctuation in this paper) which contain comma, colon,
semicolon, period, question mark, and exclamatory mark.
However, sometimes an SS can have more than one
comma, or a part of a sentence ended with a comma can be
separated into more than one SSs. The second problem is
that Chinese sentence may have more ambiguity of
multiple verbs. Not every verb occurred in a sentence but
the predicate can play a role of dividing sentence. Thus,
the process of predicate recognition is needed in our
system.
With the result of SS separation and predicate
recognition, an SS could be divided by reasonable login
concepts and verbs. Then, a shallow parsing tree can be
generated from the division.
1) SS separation
The process of SS separation mainly relies on the stop
punctuation mark and the logic concept lb.
Pattern 1 lb+word sequence+stop punctuation
In Pattern 1, the logic concepts lb is followed by several
words and a stop punctuation mark. It is a sign of the SSs
beginning.
Pattern 2 word sequence+lb+word sequence+stop
punctuation
Sometimes, the logic concept lb occurs in the middle of
a sentence. However, both parts of the sentence before and
after this word have predicates. As a result, this concept
divides this whole sentence into two SSs.
The SS separation does not only base on punctuation
and certain logic concept lb. A reasonable SS requires a
predicate. Thus, if a division of a sentence does not have a
predicate, it cannot be separated as an SS.
2) Predicate recognition
The semantic features which play an important role in
predicate recognition can be categorized into two groups.
In one group, the semantic features raise the possibilities
of a verb to be the predicate. On the contrary, the semantic
features in the other group reduce these possibilities.
The semantic features in the former group can also be
divided into three types. Some logic concepts or verbs
combine with verbs to format a predicate candidate chunk.
These logic concepts may change the tense, the voice and
the aspect of the head verb. Due to these concepts, the verb
they modified is in its finite form, which enhances the
possibility to be selected as the predicate of the SS. Also,
Supplement 2 ZHU Yun, et al. / A Chinese-English patent machine translation system based on the theory of 143
the predicate chunk could have a structure of verb+verb.
In this kind of structure, the secondary verb could also
indicate the tense or the aspect of the head verb, or the
combination can indicate the concatenation of two actions.
Thus, the possibility of a predicate candidate chunk with
complex composition to be selected as the predicate is
increased.
The occurrence of some logic concepts changes the
sequence of an SS. Some logic concepts indicate the
beginning and the end of an adverbial chunk. This
adverbial chunk modifies the following verb, and its
position in a sentence changes into the end of a sentence
when a Chinese sentence is translated into English. In
addition, the logic concept l0 also would change the
sequence of the entity chunks.
Pattern 3 Entity Chunk1+l0+Entity Chunk2+Predicate+
Entity Chunk3
Pattern 4 Entity Chunk1+Predicate+Entity Chunk2+
Entity Chunk3
The chunk sequence in Pattern 3 may change into
Pattern 4 after the translation. So we can infer that because
the sentence sequence has changed, the possibility for the
verbs related to these logic concepts to be the predicate
candidates is increased.
There are two kinds of verbs have high possibility to be
the main verb. The first one is verbs always followed by a
clause as the object. The second one is certain kind of
verbs that always used as the main verb in patent text i.e.
relate to/include/disclose/provide.
However, not all the semantic features we paid attention
to have positive influence on the related verb in increasing
the possibility to be the predicate. On the contrary, some
semantic features could decrease this possibility.
In our system, there is a kind of logic concepts, such as
this/some of, represented by lu9. This kind of concepts
indicates that the following is a noun or a noun phrase.
This decreases the possibility for a verb to be the predicate,
when it occurs after a lu9 concept.
Still, some predicate chunks are denied to be the
predicate not only by lu9. The Chinese character de is
generally used between things and their modification. As a
result, if a verb occurs before de, it always belongs to the
modification or restriction. On the other hand, if a word
occurs after de, it implies that this word is used as a
noun.
3) Results of analysis
After the semantic analysis, the shallow parsing tree is
obtained. The different nodes have their specific
compositions respectively.
Each sentence (CS) ended with a full stop is the root of
the tree, and it has several SS nodes and separators SST (lb
or stop punctuation).
CS= SS+SST


Then, an SS node is composed by more than one and
less then three entity chunks, one predicate. The adverbial
chunk and the entity chunk separator l0 are alternative.
SS=Adverbial Chunk [alternative] + Entity Chunk 1
[alternative] + Predicate + Entity Chunk 2 + Entity Chunk
3 [alternative]
If there is a logic concept l0 in the sentence, the
structure of the SS changes into following one.
SS=Adverbial Chunk [alternative] + Entity Chunk 1
[alternative] + l0 + Entity Chunk 2 + Predicate + Entity
Chunk 3 [alternative]
As the son node of the SS, the predicate and the
adverbial chunk may unfold to the next level.
Adverbial Chunk=l1+words sequence+l1h[alternative]
Predicate=l/v[alternative]+head verb+l/v[alternative]
3.3 Sentence transformation
The main task of transitional transformation is to
transform the original parsing tree into its legitimate form
in target language. It mainly contains three kinds of
operation. The first kind is to add or delete a node of the
parsing tree, the second one is to change the position of a
node in the tree, and the last one is to change some
attribute values of a node.
1) Relation between SSs
As we mentioned before, a Chinese sentence may have
more than one SS. However, Chinese is a kind of language
that lacks of inflection. Thus, if a Chinese sentence with
multiple SSs cannot be translated into a compound
sentence, the system need to decide which SS is the
independent clause and which dependent clause types the
other SSs are. Even more, some SSs are changed into
infinitive forms. To decide which form the SS should be
transform into, some logic concepts and the sharing
relation are required. For example, an SS beginning with a
coordinating conjunction or a correlative conjunction
should adopt the form of independent clause.
Pattern 5 Entity Chunk 1+Predicate 1+Entity Chunk
2+,+Predicate 2+Entity Chunk 3+.
In addition, the sharing relation helps to define relation
144 The Journal of China Universities of Posts and Telecommunications 2012
between SSs. Take Pattern 5 as an example, two SSs
connected in this pattern usually indicates that the entity
Chunk 2 is the subject of the Predicate 2, which is omitted
to avoid the duplication. As a result, the second SS can
convert into an attributive clause.
2) Reorder the sequence of entity chunks
In Sect. 3.1, a kind of logic concepts l0 is introduced.
With different l0, the sequences of chunks could change in
different ways when we translate them into English. Some
l0 are used before the subject. In this situation, the Chinese
sentence always in the following pattern.
Pattern 6 Entity Chunk 1(Object)+l0+Entity Chunk
2(Subject)+Predicate+Entity Chunk 3(Object 2/Object
Complement[alternative])
To translate this kind of sentence into English, the
sequence of chunks should be change into Pattern 7 or
Pattern 8.
Pattern 7 Entity Chunk 2(Subject)+Predicate (in active
voice)+Entity Chunk 1(Object)+Entity Chunk 3(Object 2/
bject Complement[alternative])
Pattern 8 Entity Chunk 1(Object)+Predicate(in passive
voice)+Entity Chunk 3(Object 2 Object Complement
[alternative])+ by+Entity Chunk 2(Subject)
Also, the l0 used before the object needs transitional
transformation.
Pattern 9 Entity Chunk 1(Subject)+l0+Entity Chunk
2(Object)+Predicate+Entity Chunk 3(Object 2/Object
Complement[alternative])
To translate sentence in Pattern 9 into English, the
sequence of chunks should be change into Pattern 10.
Pattern 10 Entity Chunk 1(Subject)+Predicate(in active
voice)+Entity Chunk 1(Object)+Entity Chunk 3(Object
2/Object Complement[alternative])
3) Adverbial chunks moving
In Chinese sentence, an adverbial chunk always occurs
before the predicate. However, in English, the adverbial
clause or prepositional phrase can be placed either at the
beginning or at the end of a sentence. Hence, in the step of
transitional transformation, if the adverbial chunk is at the
beginning of the sentence and it is separated by a comma,
its position in the parsing tree will not change. But if it
appears in the middle of a sentence, it will be moved to
last node of the sentences parsing tree.
Also, the adverbial chunk with both l1 and l1h needs an
extra operation. Both these two logic concepts should be
omitted in the parsing tree and add a new node with a
proper conjunction or preposition.
3.4 Generation
The main task in the generation step is to determine the
part-of-speech of each word and to transform each word
into the reasonable form. To confirm the part-of
speech (POS) of a word with ambiguity, our system mainly
focuses on the previous words POS or the next words,
based on the parsing tree after transitional transformation.
4 Algorithm of the translation
The system architecture of the system is showed in
Fig. 1.
Fig. 1 The system architecture
Our system is a rule-based system, and it contains four
modules, word segmentation, semantic analysis,
transitional transformation and generation. All these
modules need the support of the knowledge base and rule
sets. In the following part, all these modules except word
segmentation are discussed.
4.1 Semantic Analysis
Step 1 After word segmentation, send all the sentences
to pre-segment the SSs based on the lb and the stop
punctuation mark.
Step 2 Use l1, l1h and l0 to divide the SS.
Step 3 Give every verb in the SS its score. Then sort
all the verbs in descending order according to their score.
Select the one with the highest reasonable score as the
predicate.
Step 4 If it cannot find a reasonable predicate in a
pre-judged SS, combine the division with the following
SS.
Step 5 Divide the SS into several chunks based on the
l1, l1h, l0 and the predicate, and confirm every chunks
type.
After these five steps, a shallow parsing tree of an SS is
obtained. A sentence with multiple SSs forms a parsing
forest.
Supplement 2 ZHU Yun, et al. / A Chinese-English patent machine translation system based on the theory of 145
4.2 Sentence transformation
Step 1 Determine the relations between SSs with lb
and the sharing relations. Confirm which ones should be
the independent clause, and which ones should be the
dependent clause. Add the proper conjunctions or delete
the redundant ones.
Step 2 Reorder the sequence of the entity chunks and
the predicate in an SS with l0. Modify the voice of the
predicate.
Step 3 Move the adverbial chunks to the proper
position. Delete l1 and l1h and select reasonable
conjunctions or prepositions.
Step 4 Modify the tense, the voice and the aspect of
the verb in the predicate based on the logic concepts and
verbs it related to.
4.3 Generation
Step 1 Determine the part-of-speech of each word with
ambiguity according to the previous or the next word.
Step 2 Select the word with confirmed POS from the
dictionary and transform it into the correct form.
5 Experiment and result analysis
The experiment collects 277 bilingual sentences as the
main training set, which are selected by the experts of
State Intellectual Property office of the peoples Republic
of China (SIPO) from the patent corpus that were easy to
introduce the translation mistakes, and more than 6 000
sentences extracted from 30 real whole Chinese patents as
the secondary training set, which is used to build the
dictionary and to supplement the collection of the rule set.
A closed test is based on the main training set and an open
test is based on the evaluation set.
The first step is to manually establish a knowledge base
for all the words appeared both in the main and secondary
training set, which has about 11 300 words and their
semantic features. The second step is to summarize the
rules for the different steps. Through analyzing the
semantic features of logic concepts and verbs in the main
training set, we summarized rules and obtained rule sets,
which has almost 700 rules. Then we apply the human
knowledge mentioned above into the system described in
Sect. 5.
One important characteristic we emphasized in out
system is the semantic analysis and the transitional
transformation for the source language. Thus, we evaluate
the key processes, instead of using BLEU score [21] to
evaluate our system, which, in our opinion, mainly
evaluates the word selection and cannot reflect whether the
sentence structure is correct. We select the SS
segmentation, the predicate recognition, the entity chunks
reordering and the adverbial chunk reordering as four test
point of our system and manually calculate the precisions.
The results of the closed test are showed in Table 1.
Table 1 Results of closed test set
Semantic analysis Transitional transformation
Total
sentence
P of SS
segmentation/%
P of predicate
recognition/%
P of entity
chunk
reordering/%
P of adverbial
chunk
reordering/%
277 92 94 80 90
To emphasize the importance of the predicate
recognition in semantic analysis and show the comparison,
we also send these 277 sentences to Google online
translation. The data from Google were obtained from
analyzing the translation results. All the test results are
showed in Table 2.
Table 2 Compared test results of closed test set
Total Detected Correct P/% R/% F/%
OURS 363 354 333 94.1 91.7 92.9
GOOGLE 363 288 217 75.3 59.8 66.7
In Table 2, OURS is the result obtained from our system,
and GOOGLE is the one from Google online translation.
From this table, we can see that the result from Google is
much lower than the one from our system. Because our
system gives intermediate results, and the Googles results
are inferred from the translation results. There mainly are
two kinds of errors that occur in Googles results. The first
one is that more than one verb brings ambiguity in
predicate recognition and Google translation selects wrong
verb as predicate, which, for the most part, affect the
precision. The other one is that the translation results
always has no predicate because the correct predicate head
is in non-finite forms, which mainly affect the recall. From
this comparison, we can see that using our method to
identify the predicate chunk performed well in the closed
test.
After the close test, we take the open test on the
evaluation set. The test results are showed in Table 3.
146 The Journal of China Universities of Posts and Telecommunications 2012
Table 3 Results of open test set
Semantic analysis Transitional transformation
Total sentence
P of SS segmentation/% P of predicate recognition/% P of SS segmentation/% P of predicate recognition/%
Best 267 96 91 94 96
Worst 252 68 46 43 80
Average 6 308 83 75 74 85
From Table 3, we can see that all the precisions of the
test points of the best result in 30 pieces of patent are over
90%. The test result shows that our method can improve
the accuracy of parsing, and thus, based on correct
syntactic tree, the performance of MT system can be
improved. We continue to supplement and amend the rule
set during the open test.
Through analyzing the experiment results, we found that
because of the pure rule-based method, the system has
high dependency on the completeness of rules and the
accuracy of the knowledge base. Therefore, the rules we
obtained is not enough. We need to add some
complementary rules for special language phenomenon to
improve the performance. However, we still can see that
the system has room for improvement. We believe that this
method can be put into practical application after we
complete the rules and knowledge base.
6 Conclusions
In this paper, we present a rule-based system for
Chinese-English patent machine translation. Through the
closed test and the open test, the satisfactory results are
obtained. We propose this method in order to improve the
performance of the Chinese-English patent MT system
based on the syntactic tree we provide.
In the future, further work should be undertaken to
complete the rules to enhance the performance of our
semantic analysis and transitional transformation, and the
entity chunks need to unfold to get more accurate
analyzing result.
Acknowledgements
This work was supported by the Hi-Tech Research and
Development Program of China (2012AA011104), the Fundamental
Research Funds for the Center Universities.
References
1. Fujii A, Utiyama M, yamamoto M, et al. Overview of the patent translation
task at the NTCIR-7 workshap. NTCIR-7 Workshop Meeting, Japan, 2008
2. Dai X, Yin C, Chen J, et al. Machine translation: past, present, future.
Computer Science, 2004, 31,(11): 176179, 184 (in Chinese)
3. Sui Z, Wen S. The acquisition and application of the knowledge for
recognazing the predicate head of a Chinese simple sentence. Acta
Scientiarum Naturalium Universitatis Pekinensis, 1998, 34(23): 221230
(in Chinese)
4. Sui Z, Wen S. The research on recognizing the predicate head of a Chinese
sentence in EBMT. Journal of Chinese Information Processing, 1998, 12(4):
3946 (in Chinese)
5. Gong X, Luo Z. Recognizing the predicate head of Chinese sentences.
Journal of Chinese Information Processing, 2003, 17(2): 713 (in Chinese)
6. Wei Z, Xiong L, Zhang Q. Research on automatic acquiring head verb of
Chinese sentences. Computer Engineering and Applications, 2007, 43(10):
179182 (in Chinese)
7. Li G, Meng J. A method of identifying the predicate head based on the
correspondence between the subjuct and the predicate. Journal of Chinese
Information Processing, 2005, 19(1): 17, 41 (in Chinese)
8. Kudo T, Matsumoto Y. Chunking with support vector machine. The Second
Meeting of the North American Chapter of the Association for
Computational Linguistics on Language Technologies, 2001: 18
9. Daelemans W, Buchholz S, Veenstra J. Memory-based shallow parsing.
EACL99 workshop on Computational Natural Language Learning
(CoNLL-99), 1999: 5360
10. Garrido A A, Gilabert Z P, Prez-ortiz J A, et al. Shallow parsing for
Portuguese-Spanish machine translation. Tagging and Shallow Processing
of Portuguese: workshop notes of TASHA'2003, 2004: 2124
11. McDonald R, Pereira F, Ribarov K, et al. Non-projective dependency
parsing using spanning tree algorithms. Conference on Human Language
Technology and Empirical Methods in Natural Language Processing, 2005:
523530
12. Tapanainen P, Jrvinen T. A non-projective dependency parser. The Fifth
Conference on Applied Natural Language Processing, 1997: 6471
13. Kudo T, Matsumoto Y. Japanese dependency structure analysis based on
support vector machine. SIGDAT Conference on Empirical Methods in
Natural Language Processing and Very Large Corpora. 2000, 13: 1825
14. Kudo T, Matsumoto Y. Japanese dependency structure analysis using
cascaded chunking. 6th Conference on Natural Language Learning, 2002,
20: 17
15. Zhou M. A block-based robust dependency parser for unrestricted Chinese
text. The Second Workshop on Chinese Language Processing, 2000, 12:
7884
16. Zhang Y, Zens R, Ney H. Chunk-level reordering of source language
sentences with automatically learned rules for statistical machine translation.
SSST, NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in
Statistical Translation, 2007: 18
17. Wang C, Collins M, Koehn P. Chinese syntactic reordering for statistical
machine translation. 2007 Joint Conference on Empirical Methods in
Natural Language Processing and Computational Natural Language
Learning, 2007: 737745
18. Huang Z. HNC (hierarchical network of concepts) theort. CN: Tsinghua
University Press, 1998 (in Chinese)
19. Jin Y. Natural language understanding based on the theory of HNC
(hierarchical network of concepts). CN: Science Press, 2005 (in Chinese)
20. Jin Y. A hybrid-strategy method combining semantic analysis with
rule-based MT for patent machine translation. 2010 International
Conference on Natural Language Processing and Knowledge Engineering
(NLP-KE), 2010: 14
21. Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic
evaluation of machine translation. Technical Report RC22176 (W0109-022),
IBM Research Report

S-ar putea să vă placă și