Sunteți pe pagina 1din 10

Translation and the Web in the era of Machine Translation

Translation and the Web


in the era of Machine Translation (MT)
How to use MT and write/edit texts
Hellmut Riediger Gabriele Galati
Key words: machine translation, simplified writing, writing for the web, sustainable
communication

Abstract:
Machine Translation has become an essential tool for professional translators.
Post-Editing Machine Translation (PEMT) is a key trend in the domain of professional
translation.
Size of texts or words requiring MT are on the rise; this trend forced the market to look for
professionals able to analyse and post-edit the output of an automatic translator in
adherence to customer requirements. This comprises the range of further usability and
improve the performance of MT as such.
We are hereby trying to describe what MT can do as to skills and uses besides post-editing,
especially in connection with web pages written in simplified language.

Translation and the Web in the era of Machine Translation

Introduction
In 1965 Italo Calvino wrote a famous article(Calvino 1980) in which he inveighed against
"the antilingua (the anti-language) of bureaucracy. At the end of the quote, he predicted
that
Our age is characterized by this contradiction: on the one hand we need to be able to
translate everything which is being said into other languages immediately translated into
other languages; on the other we realise that every language is in a self-contained system of
thought in itself and by definition untranslatable. My prediction is this: each language will
revolve around two poles. One pole is immediate translatability into other languages which
will come close to a sort of all-embracing, high-level interlanguage; and another pole will
be where the singular and secret essence of the language, which is by definition
untranslatable, is distilled and entrusted to diverse linguistic institutions such as popular
argot and the poetic creativity of literature

After fifty years, MT has become the major instrument of the first pole, i.e. "immediate
translatability". Despite the scepticism and resistance of many a human translator, MT has
been taking over for several years a central role in the language and translation industry to
the extent that in 2014 the MT market value adds up to 250 million dollars with an
irreversible growth trend (van der Meer 2014b).
The MTOE (Machine Translation Post Editing) is an ever-increasing skill demanded from
professional translators, whereas MT for Multilingual terminology search and retrieval of
parallel texts is part of run-of-the-mill translation assignments.
Technology is steadily improving, there are though a few limitations that restrict the facile
enthusiasm of those who think that they may at some point do away with all human input.
To some extent, one might say that the question is not about human replacement, but
rather what one can do to improve the performance of those systems to reduce costs, as
well as carrying out less gratifying tasks or create new professional opportunities.
One paramount aspect is to question how to write or edit source texts that are easily
intelligible and translatable by using a simplified language and pre-editing (see also
Muegge, 2007) in order to optimise

- translation time and relevant cost

- the texts for web users who use online automatic translation on a regular base to
get the gist of the contents in languages unknown to them.

We are presenting the results of researches we have carried out on this issue in the Weaver
laboratory.

1 MT development and use


1.1 Development
Up until a few years ago, there was a widespread skepticism amongst translators about MT.
A renowned spokesperson for MT-skeptics was Umberto Eco. His essay on translation "Say
almost the same thing" opens up with the chapter "Synonyms of Altavista", (2007, 25 Eco
et seq.) presenting a series of translation examples made in the late '90 from Babel Fish
MT, at the time in use by Altavista search engine. Some of those examples are the hilarious

Translation and the Web in the era of Machine Translation

automatic translations of the Bible (Genesis) in several language combinations with terms
listed in the following table:
English source text

Italian translation by Babel Fish

The Works of Shakespeare

Gli impianti di Shakespeare

Hartcourt Brace Brace

sostegno di Hartcourt

Speaker of the Chamber of Deputies

Altoparlante dellalloggiamento dei delegati

Studies in the logic of Charles Sanders


Pierce

Studi nella logica delle sabbiatrici Peirce del Charles

In 2014 we have translated the same phrases using Google translate and got these results:
English source text

Italian translation by Google translate

The Works of Shakespeare

Le opere di Shakespeare

Hartcourt Brace Brace

Harcourt Brace

Speaker of the Chamber of Deputies

Presidente della Camera dei deputati

Studies in the logic of Charles Sanders Pierce

Studi nella logica di Charles Sanders Peirce

This is an undeniable improvement. What has happened in these past few years?
Modern MT systems use statistical approaches, in other words, they attain translation from
online corpora of multilingual human-collected texts constantly updated and corrected
through user contribution looking for the same or similar pre-existing texts and
translations.
Briefly, talking about statistic MT systems we can confirm that

the more you use them (properly), the better they work.

the longer they are online (in the "cloud" or other shared databases) the more
accessible they are to an ever larger number of people.

This is a revolutionary event. Research (see e.g. Plitt Masselot and Pym 2010 & 2012)
showed a significant increase in productivity, both in the classroom and at professional
level. This is a revolution affecting technology and above all the role and social function of
translation itself.
TAUS is an institution promoting MT. They have coined the expression Convergence Era
meaning by that MT is not an end in itself, but an essential tool in an era where contents
are to be available in all languages and at all times. Translation turns into a utility in its
own right, like water, electricity or internet, merging into anything like an app, search
tools, social media and Internet of Things. It will not be perfect but a real-time
communication need prevails over the linguistic excellence (cf. van der Meer 2014a).
What are the consequences for professionals and translators training? We are expecting
that statistical MT with its many hybrids is soon going to turn many translators into post- as
well as pre-editors and managers of translation systems. There is an urgency in rethinking
the basic make-up of our training programmes and updating our translation skill models.

1.2 MT use and connected activities


CAT-Tool integrated MT

CAT-Tool integrated pre-MT

Translation and the Web in the era of Machine Translation

Post-editing

full post-reviewing: human translation and high quality reviewing also known as
"publishable quality"
light post-reviewing: human translation and low level reviewing also known as "good
enough" or "fit for the purpose" (cfr. TAUS-Evaluation 2010)

Pre-editing

source text alteration (abbreviation, simplification) to ease post-reviewing and


make it economically viable.

1.3 Other uses


Use MT as a dictionary

Figure 1-MT of a term with GoogleTranslate

An increasingly common practice is to use MT as a dictionary. Google Translate for instance


contains various bilingual dictionaries. A term is listed against the corresponding entry. In
addition, Google allows its users to create their own customised dictionaries that one can
use along with MT. Many online dictionaries offer similar resources.
There is a widespread use of Statistical MT as a tool for finding a specialized term
equivalent for sets of languages. In many cases, you can achieve better results in a far
shorter time than that needed when using dictionaries and traditional terminology
databases.
Find parallel texts with MT
Translation tools such as 2lingual.com make use of MT automatically translating the search
string entered by the user showing Google results retrieved in two languages on the same
page and at the same time.

Translation and the Web in the era of Machine Translation

Figure 21

2 Sustainable communication: the importance in using languages


and simplified writing.
Within the MT domain, Controlled Language means writing texts consisting of simple and
intelligible sentences with a basic vocabulary and simplified syntax. This kind of writing is
similar to other initiatives and recommendations linked with the many forms of sustainable
communication, such as the use of languages and simplified writing at both European and
international level in various contexts (educational, technical, bureaucratic), web writing
and accessibility.
Briefly, a controlled language is based on a set of rules such as, a short phrase length, a
basic vocabulary, simplified verbal forms, restricted subordinate phrase, and use of
explicit subject. You may check readability indexes by using tools like Gulpease as found in
MS Word.
We have summarized the essential principles of controlled language in the acronym
BASICO with each letter standing for:
Brief: i.e. less than 25 words per sentence or 500 characters per paragraph
Active: prefer active form instead of passive, avoid gerund and impersonal clauses.
Simple: use simple words; avoid rare or polysemous words and technical terms.
Incisive or trenchant: one idea for one sentence, no subordinates or non-restrictive clauses
Clear: always show subject and object, avoid pronouns and pronominal particles; saying
the same thing without fearing repetition or using synonyms for the same word
Optimise for the destination medium

2.1 MA in the multilingual web


If you want to be competitive in foreign markets and attract traffic to your website you
must speak the language of your clients or at least make the texts intelligible by those who
do not speak your language. There are two choices if you want to boost up online visibility
and access new language areas, namely to translate your website into other languages or

Translation and the Web in the era of Machine Translation

make the site available to those using automatic translation tools. Actually, there are
more and more people using MT to translate web pages from languages they do not
understand. As an instance, Google Translate in 2012, received 200 million requests for
translations per month (Och 2012), equivalent to 2.4 billion translation requests per year,
92% of which came from countries outside of the US for languages other than English. If
you write translatable texts the potential number of readers/users increase for the benefit
of internationalisation.

2.2 Example 1 Expo 2015


Milan Expo 2015 is a great opportunity to attract foreigners in Italy. Twenty million visitors
from over 140 countries are expected to visit Expo, the equivalent of 2/3 of all world
countries with visitors speaking more than 60 languages!
Regretfully the Expo website is only available in Italian, English and French. Here is one of
the earliest texts taken from the presentation section and published on the web:
Expo Milano 2015 intende affrontare la tematica universale e complessa della
nutrizione da un punto di vista ambientale, storico, culturale, antropologico, medico,
tecnico-scientifico ed economico.
Tale impostazione multidisciplinare crea interessanti intrecci, correlazioni e
collegamenti: Expo Milano 2015 propone di affrontare il tema secondo una scansione
molto ampia, capace di interrogare e stimolare tutti i livelli della societ, affinch
emerga la consapevolezza della vastit e della complessit dei fattori che
coinvolgono ognuno di noi.
Fin dal Dossier di candidatura il tema generale di Expo Milano 2015 Nutrire il Pianeta,
Energia per la Vita stato declinato nei seguenti sottotemi:
1.

Scienza e tecnologia per la sicurezza e la qualit alimentare;

2.

Scienza e tecnologia per lagricoltura e la biodiversit;

3.

Innovazione della filiera agroalimentare []

The text posted on the site is 11,843 characters: too long for a webpage. Even for Italians
such a text is convoluted and tortuous.

2.2.1 The following is an unedited MT of the above text into English

Expo 2015 thinks to address the issue of universal and complex nutrition from the
point of view of environmental, historical, cultural, anthropological, medical,
scientific-technical and economic. This multidisciplinary approach creates
interesting plots, correlations and connections: Milan Expo 2015 aims to address the
issue according to a scan very large, capable of interrogating and stimulate all levels
of society to show awareness of the vastness and complexity of the factors that
affect each our [...]

2.2.2 The following is an unedited MT of the above text into German

Expo Milano 2015 beabsichtigt, die Frage der universellen und komplexe Ernhrung
aus anzugehen Sicht, historische organic, kulturelle, anthropologische,
medizinische, wissenschaftlich-technischen und wirtschaftlichen. Dieser Ansatz
multidisziplinre
schafft
interesting
Grundstcke,
Zusammenhnge
und
Verbindungen: Milan Expo 2015 zielt darauf ab, die Frage nach einem sehr gro Scan
ist, in der Lage, Abfragen und regen to Ebenen der Gesellschaft das Bewusstsein fr
die Weite und Komplexitt der Faktoren, die Einfluss auf die jeweils hervor Adresse
von uns [---]

Translation and the Web in the era of Machine Translation

2.2.3 Rewriting the original Italian text

This is an excerpt of the rewritten source text using a controlled language. From 11,000
characters, we shrank the text to a puny 4,200 Ch.
Expo Milano 2015 Nutrire il Pianeta, Energia per la Vita ha individuato tre aree
tematiche che si sviluppano in diversi sottotemi:
A. Area tecnico-scientificaPrende in esame i processi produttivi, le politiche e i
meccanismi di mercato. I sottotemi sono:
Scienza e tecnologia per la sicurezza e la qualit alimentare;
Scienza e tecnologia per lagricoltura e la biodiversit;
Innovazione della filiera agroalimentare []

2.2.4 MT applied to the above rewritten English text

The following is an MT rendering into English using Google Translate. You will soon realise
that rewriting bettered the translation, too.
Milan Expo 2015 "Feeding the Planet, Energy for Life" has identified three areas that
develop in different sub-themes: a. Technical-scientific
It examines the processes, policies and market mechanisms. The sub-themes are:
Science and technology for safety and food quality;
Science and technology for the agriculture and biodiversity;
Innovation in the food industry [...]
2.2.5 MT applied to the above rewritten German text
Milan Expo 2015 Den Planet ernhren, Energie fr das Leben hat drei Bereiche, die
in verschiedenen Unterthemen entwickeln identifiziert:
A. Technisch-wissenschaftliche
ER untersucht die Prozesse, Richtlinien und Marktmechanismen. Die Unterthemen
sind:
Wissenschaft und Technologie fr Sicherheit und Lebensqualitt.
Wissenschaft und Technik fr die Landwirtschaft und Biodiversitt [...]

2.3 Example 2: research into the performance of MT


In 2013 we have conducted a research into the efficacy of MT (Galati 2013) based on an
adequacy criterion. We translated into six languages a brief text taken from the
Encyclomedia with the help of nine online translators.
We deemed the translation appropriateness against the following scale of values:
Acceptability
3-perfectly acceptable translation. Does not need reviewing
2-intelligible translation requiring style adjustments.
1-intelligible translation with inconsistent grammar, language
and style errors
0-unintelligible translation. Requires rewriting.
For sake of comparison, we listed the texts on the table below:

Translation and the Web in the era of Machine Translation

Source text

Rewritten text

Poeta italiano.

Poeta italiano.

Studia grammatica e retorica, entrando in


contatto con i principali esponenti
dell'ambiente culturale fiorentino, tra cui
Brunetto Latini, Lapo Gianni, Guittone d'Arezzo
e Guido Cavalcanti.

D. studia grammatica e retorica, entrando in


contatto con i principali rappresentanti
dell'ambiente culturale fiorentino, tra cui
Brunetto Latini, Lapo Gianni, Guittone d'Arezzo
e Guido Cavalcanti.

A partire dal 1292 partecipa alla vita politica


del comune fiorentino, sostenendo la fazione
guelfa contro i ghibellini.

A partire dal 1292 egli prese parte alla vita


politica del comune fiorentino, sostenendo la
fazione guelfa contro i ghibellini.

Dopo la spaccatura dei guelfi, nel 1302 viene


esiliato dai Neri.

Dopo la spaccatura dei guelfi, nel 1302 D.


venne esiliato dai Neri.

Con il trattato De vulgari eloquentia conferisce


al volgare dignit di lingua letteraria e
scientifica, come lui stesso dimostra nel
Convivio.

Con il trattato De vulgari eloquentia D.


promuove l'uso del "volgare" nelle scienze e
in letteratura, come lui stesso dimostra nel
Convivio.

Esponente del "dolce stil novo", ha tra le sue


opere pi importanti la Vita Nova, il De
Monarchia e la Commedia.

Rappresentante del "Dolce stil novo", egli ha


tra le sue opere pi importanti la Vita Nova, il
De Monarchia e la Commedia.

The result of the study showed that MT definitely improves with text pre-editing.

2.4 Example 3: PROMAC site


In 2014 we have conducted a different study by taking a multilingual website sample of an
Italian coffee machines manufacturer, namely Promac Italy. The study compared the
quality of English, French, Spanish and German translations on the website pages against
Google MT by applying the said acceptability criterion. The two texts were rather
different; one more descriptive about the Company, the other more technical about their
produce. Firstly, we assessed the quality of translations on the website ranging from poor
to sufficient. MT quality averaged equal or lower than the website translation when using
the Italian original source text with very different results in terms of language and type of
translated text. After a bland pre-editing on the Italian text however, Google MT quality
faired better than human translation on this site on average.

Translation and the Web in the era of Machine Translation

Figure 3 2

The study confirms a previously verified hypothesis: the combined use of a simplified
writing and MT may be rewarding in terms of quality and economy.

2.5 Conclusions: why is a "controlled language" useful and profitable?


Italian texts drafted using a controlled language makes them

intelligible by ordinary readers

intelligible by web page readers

easily translatable by anyone who knows our language

better translated with MT

available to those who do not know our language via MT

By controlled language text editing we can save money when it comes to translation.
Its saving ensues from simplifying the texts and reducing its overall length.
Saving also comes from easing the human translator's work:

the final text is easier to translate and the translator needs less time

the final text can be automatically translated and revised by the translator

A simplified text allows web users to use MT in other languages yielding a more intelligible
translation.

3 Bibliography
Calvino, I. "Per ora sommersi dallantilingua." In Una pietra sopra, di Italo Calvino, 122-126. Torino:
Giulio Einaudi Editore, 1980.
-European Commission. How to write clearly. 2010.
http://ec.europa.eu/translation/writing/clear_writing/how_to_write_clearly_en.pdf (accessed
March 30, 2015).

Translation and the Web in the era of Machine Translation

Eco, U. Dire quasi la stessa cosa - Esperienze di traduzione. Milano: Bompiani, 2007.
Galati G. & H.Riediger "Traduttori automatici online gratuiti nellattivit di redazione editoriale
Ricerca 2013." Laboratorio Weaver. April 30, 2013.
http://www.fondazionemilano.eu/blogpress/weaver/migliori-traduttori-automatici-2013/(accessed
March 30, 2015).
De Mauro, T. Guida alluso delle parole. Roma: Editori Riuniti, 1983.
Muegge, U. Controlled language: The next big thing in Translation. January 2007.
http://works.bepress.com/cgi/viewcontent.cgi?article=1003&context=uwe_muegge (accessed
March 30, 2015).
Netzwerk-Leichte-Sprache. Netzwerk-Leichte-Sprache. 2008. http://www.leichtesprache.org (
accessed March 30, 2015).
Nielsen, J. Web usability . Milan: Apogeo, 2000.
Och, F. "Breaking down the language barrier. " Google-official blog. April 26, 2012.
http://googleblog.blogspot.it/2012/04/breaking-down-language-barriersix-years.html (accessed
March 30, 2015).
Plitt, M. & F. Masselot. ' (A)"Productivity Test of Statistical Machine Translation post-editing on a
Typical Localization Context." The Prague Bulletin of Mathematical Linguistics, January 2010: pp. 716.
Pym, A.. " Translation Skill-sets in a Machine-translation Age '. " Meta, 2012.
TAUS-Evaluation. "Guidelines for automatic translation postrevision." TAUS. November 2010.
https://www.taus.net/think-tank/best-practices/postedit-best-practices/machine-translation-postediting-guidelines (accessed March 30, 2015).
Van der Meer, J. & A. Ruopp. "MT Market Report 2014." TAUS-Enabling Better Translation. August
2014. https://www.taus.net/think-tank/reports/translate-reports/mt-market-report-2014
(accessed March 30, 2015).
Van der Meer, J. "The Convergence Era, Translation as a utility." March 27, 2014.
https://www.brighttalk.com/webcast/9273/76923 (accessed March 30, 2015).

S-ar putea să vă placă și