Sunteți pe pagina 1din 33

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/301793106

The Modular Assessment Pack: a new approach


to translation quality assessment at the
Directorate General for...

Article in Perspectives Studies in Translatology May 2016


DOI: 10.1080/0907676X.2016.1167923

CITATIONS READS

2 91

3 authors:

Roberto Martnez Mateo Silvia Montero Martnez


UNIVERSITY OF CASTILE LA MANCHA SPAIN, C University of Granada
17 PUBLICATIONS 7 CITATIONS 37 PUBLICATIONS 213 CITATIONS

SEE PROFILE SEE PROFILE

Arsenio Jess Moya Guijarro


Spanish National Research Council
20 PUBLICATIONS 90 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Cognitive and Neurological Bases for Terminology-enhanced Translation (CONTENT) View project

Censura y LIJ en el siglo XX View project

All content following this page was uploaded by Silvia Montero Martnez on 04 May 2016.

The user has requested enhancement of the downloaded file.


Perspectives
Studies in Translatology

ISSN: 0907-676X (Print) 1747-6623 (Online) Journal homepage: http://www.tandfonline.com/loi/rmps20

The Modular Assessment Pack: a new approach to


translation quality assessment at the Directorate
General for Translation

Roberto Martnez Mateo, Silvia Montero Martnez & Arsenio Jess Moya
Guijarro

To cite this article: Roberto Martnez Mateo, Silvia Montero Martnez & Arsenio Jess
Moya Guijarro (2016): The Modular Assessment Pack: a new approach to translation
quality assessment at the Directorate General for Translation, Perspectives, DOI:
10.1080/0907676X.2016.1167923

To link to this article: http://dx.doi.org/10.1080/0907676X.2016.1167923

Published online: 02 May 2016.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=rmps20

Download by: [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] Date: 04 May 2016, At: 04:57
PERSPECTIVES, 2016
http://dx.doi.org/10.1080/0907676X.2016.1167923

The Modular Assessment Pack: a new approach to translation


quality assessment at the Directorate General for Translation
Roberto Martnez Mateoa , Silvia Montero Martnezb and Arsenio Jess Moya
Guijarroa
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

a
Department of Modern Philology, University of Castilla La Mancha, Cuenca, Spain; bDepartment of
Translation and Interpreting, University of Granada, Granada, Spain

ABSTRACT ARTICLE HISTORY


This paper presents a conceptual proposal for Translation Quality Received 4 March 2015
Assessment (TQA) and its practical tool as a remedy to the Accepted 29 February 2016
deciencies detected in a quantitative quality assessment tool
KEYWORDS
developed at the Directorate General for Translation (DGT) of the Translation Quality
European Commission: the Quality Assessment Tool (QAT). The Assessment (TQA);
new theoretical model, the Functional-Componential Approach quantitative models;
(FCA) takes on a functionalist and holistic quality denition to qualitative models;
solve the theoretical shortcomings of the QAT. Thus, it Functional Componential
incorporates the complementary top-down view of a qualitative Approach (FCA); Modular
module to build up a quality measurement tool, the aim of which Assessment Pack (MAP)
is to increase inter- and intra-rater reliability. Its practical tool, the
Modular Assessment Pack (MAP), is tested using a pretest-posttest
methodology based on an ad hoc corpus of real assignments
translated by professional freelance translators. The results of this
experimental pilot study, carried out with the English-Spanish
language pair at the Spanish Language Department of the DGT,
are described and discussed. This analysis sheds some light on the
benets of adopting a mixed bottom-up top-down approach to
quality assessment and reveals some weaknesses of the FCA
which suggest the methods of further research. Although small-
scale, the ndings of this pilot study indicate that improvements
can be achieved by remedying its limitations in broader
experimental conditions and adjusting the tool for use in other
language combinations.

1. Introduction
This article reviews two methodologies for Translation Quality Assessment (TQA) in
order to improve the weaknesses detected in the Quality Assessment Tool (QAT), a pro-
totype quantitative tool developed by the Directorate General for Translation (DGT). Dis-
cussions on how to determine the quality of a translation tend to be linked to relativity and
subjectivity. This is partly because of the blurred borders of the concept of quality itself
(Bowker, 2001, p. 347) and partly because of the necessary participation of a rater (the
human factor) in the assessment (House, 1997, p. 47). That is why, currently, in TQA,

CONTACT Roberto Martnez Mateo Roberto.martinez@uclm.es


2016 Informa UK Limited, trading as Taylor & Francis Group
2 R. M. MATEO ET AL.

the aim is to limit subjectivity and raise interobjectivity (Gerzymisch-Arbogast, 2001,


p. 238).
There is no commonly accepted single model or methodology in TQA (Colina, 2009).
On the one hand, the denitions of quality and the model of assessment have not been
empirically veried (Martnez & Hurtado, 2001, p. 274) and, on the other, no model or
methodology takes into account the combined textual, contextual and functional
aspects of a translation. These product-centered methods are divided into two mainstream
approaches: qualitative (Williams, 1989), holistic (Waddington, 2000) or theoretical
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

(Colina, 2009) models that offer a global assessment from a macrotextual viewpoint
(top-down); and quantitative (Williams, 1989), analytic (Waddington, 2000) or practical
(Colina, 2009) models that offer a microtextual approach (bottom-up). The latter are more
widely used in professional settings.
With a view to building a new theoretical approach to TQA with a practical tool, we
carried out a critical examination of the most representative quantitative and qualitative
models.1 Quantitative tools (also known as metrics) include: the SICAL (Systme Cana-
dien dapprciation de la Qualit Linguistique) (Williams, 1989, p. 14); the LISA Quality
Assurance Model (LISA, 2007, p. 43); the SAE J2450 (SAE J2450, 2001, pp. 13); the
TAUS Dynamic Quality Evaluation (QE) Model, one of the latest and most signicant
contributions to TQA, developed by OBrien (2012) in collaboration with the Translation
Automation User Society (TAUS);2 and the QAT (EC, 2009, pp. 1112), a prototype tool
developed by the DGT of the European Commission to assess quality quantication of
external translations.
All these stand-alone tools apply quality control procedures (Parra, 2005) (SAE J2450
also allows for quality assurance). However, these TQA metrics present some common
weaknesses: they rely on rating scales that lack an explicit theoretical base (Colina,
2008, 2009; Jimnez-Crespo, 2011); they rely on the central concept of error as the dening
element of their assessment model and, subsequently, on related issues such as error type,
severity and error weightings, which sometimes present an unclear denition (Parra,
2005); they shape their denition of a quality translation as an error-free text or a text
with a number of errors (their allocated points) that does not surpass a predened limit
(acceptability threshold); they consider the notion of error as absolute, disregarding its
functional value (Martnez & Hurtado, 2001), and they identify and tag errors in isolation
rather than in relation to their context and function within the text (Nord, 1997). The line
that separates the error categories is sometimes so thin or blurred that different reviewers
might classify the same error into different categories, and the search for errors is limited
to the word and sentence level without taking into account the larger unit of the text or the
communicative context (Colina, 2008, 2009; Nord, 1997; Williams, 2001). The reviser
carries out a partial revision (Parra, 2007) of the selected sample, so the representativeness
of the limited, variable-length sample could be questioned (Gouadec, 1981; Larose, 1998);
these metrics do not specify what type of revision has to be made (i.e. unilingual, compara-
tive, etc.; Parra, 2005).
Despite these drawbacks, these quantifying systems ll a gap in the professional TQA
arena (Jimnez-Crespo, 2011), in which translation becomes a business with constraints
of time (De Rooze, 2006) and budget (OBrien, 2012). Nevertheless, metrics have the
following advantages: shared repetitive macro-error categories; a clear quality categor-
ization, with an acceptability threshold and different quality ranges; and the fact that
PERSPECTIVES 3

assessment relies on a predetermined error classication and transparent error weight-


ings, known a priori by all parties involved (Schffner, 1998). In addition, these
metrics raise expectations of a high inter-rater reliability (Colina, 2008, 2009), offering
results that are valid, justied and defendable. Metrics also bestow systematicity
and reproducibility on a process that necessarily requires human intervention
(Hnig, 1998).
As for the qualitative theoretical models in TQA, the most useful for this study are
Colinas framework (Colina 2008, 2009), and the American Translators Association
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

(ATA) rubric for grading (v. 2011).3 Both Colinas framework and the American Trans-
lation Association rubric adopt a textual and a functional approach to TQA as they analyze
the product of the translation (a text) taking the intended purpose of the translation as the
key criterion to determine its linguistic quality. They rely on a double-entry table that
relates dimensions (assessment criteria that correspond to smaller units of the quality con-
struct in translation), command levels and, at the intersection, level descriptors (Barber &
De Martn, 2009, p. 99). The success of this tool lies in the right choice and denition of
these three items.
The advantages of rubrics are: they provide a reference framework for the rater that
facilitates his decision-making based on limited, known and transparent criteria which
limits the subjective burden inherent to any assessment process with human intervention;
it is only necessary to allot positive or negative values to the descriptors to offer a summa-
tive valuation; they assess translation as a product in a given instance from a top-down
approach, which offers a general valuation of the text; and, as they are based on descriptive
propositions (descriptors), dimensions simplify the raters task and allow the rater to con-
centrate on the inadequacies from the medium range of the quality continuum (Jimnez-
Crespo, 2009, p. 76).
On the other hand, the following deciencies have been noticed: they have to ade-
quately describe the object they dene and descriptors have to convey the essence of
the feature they aim to assess; currently, there is no experimental verication of the exist-
ing models of translation competence, so the identication of the dimensions and
command levels has been based on those models which enjoy a long tradition and disse-
mination; and when the translator only receives the nal score, its capacity to offer mean-
ingful information about the overall quality may be diminished (Simon, 2001). To sum up,
while practical models have an extensive application, they also have limited transferability,
as they lack theoretical foundation and empirical validation (Colina, 2009, p. 237; Jimnez-
Crespo, 2011, p. 316). Also, these models rely on error quantication, a central issue in the
debate about academic and professional assessment (Jimnez-Crespo, 2011; Kussmaul,
1995). Meanwhile, theoretical models offer a global view, but they lack the required appli-
cability that professional translation demands. In this context, this paper introduces a new
tool for TQA, based on the Functional-Componential Approach (FCA) (Martnez-Mateo,
2014a). This proposal aims to remedy the deciencies found in the freelance translation
quality assessment tool devised at the Spanish Department of the DGT, by combining
both mainstream methodologies and embedding the theoretical underpinnings of the
Skopos theory. It also describes and discusses the preliminary results of a small-scale,
exploratory pilot study, carried out with the Modular Assessment Pack tool (MAP) (a
quantitative- and qualitative-based application). Finally, some observations on the
MAPs potential adjustments to other professional contexts are made.
4 R. M. MATEO ET AL.

2. Outsourced translations and quality assessment at the DGT


The DGT, the European Commission translation body, handles a huge translation work-
load, which forces it to outsource part of it to freelance translators. The legal framework of
the DGT requires an assurance of the quality of their translations (EC, 2012, p. 17); thus,
in-house translator reviewers, with the aid of linguistic guidelines and an error typology,
assess every translated text.4
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

2.1. The traditional approach and QAT


The quality that the DGT seeks for its translations is summarized by the functional motto t-
for-purpose (EC, 2009, p. 11); although, in fact, the DGT does not implement a functional
quality notion or a functional assessment methodology (Martnez-Mateo, 2014b, p. 85).
The traditional procedure consists of an in-house reviewer carrying out quality control on a
randomly extracted sample text of about 10% of the target text (TT) (210 pages) to nd errors
and correct them.5 The reviewer relies on an eight-category error typology (sense (SENS),
omission (OM), terminology (TERM), reference documents (RD), grammar (GR), spelling
(SP) and punctuation (PT)), which classies errors according to their severity (high and low).
After an internal audit in 2008, the DGT launched the Program for Quality Manage-
ment in Translation 22 Quality Actions to tackle some deciencies. Action 4 deals with
the shortcomings found in the freelance translation assessment system and proposes
adopting a professional approach using quantitative criteria as much as possible. So, in
2009, the QAT was developed. This quantitative tool is based on the DGTs error typology
and on three textual proles (general, technical and political). It aims to quantify quality
by error identication. Error weights depend on their text type, category and severity. The
nal weight aggregate is subtracted from an initial bonus, taking into account the number
of pages reviewed (100 points per ve pages). The nal mark places the translation within
a section of the DGTs quality categorization scheme (EC, 2012, p. 16; Table 1).
When the QAT was put to the test in 2009 it produced unsatisfactory results, basically
due to the following reasons (Martnez-Mateo, 2014b, pp. 8586): the QAT lacks a theor-
etical foundation, as do other similar quantitative tools; it relies on analysis of error type
and severity but disregards an errors nature, a key feature in determining the errors
impact on the target text; it views errors as absolute (Martnez & Hurtado, 2001,
p. 281), disregarding the errors relative value; and analysis is only carried out at sentence
level, ignoring supratextual, pragmatic, communicative and functional issues.

2.2. The FCA and the MAP


In line with previous empirical studies (Colina, 2008, 2009; Jimnez-Crespo, 2009; Wad-
dington, 2000), the FCA embeds the t-for-purpose motto as a ruling principle in TQA,
thus taking on a functional approach to error and a componential view of quality.

Table 1. DGTs quality assessment scheme.


Grade Unacceptable Below average Acceptable Good Very Good
Points 039 4059 6069 7085 86100
PERSPECTIVES 5

Since Skopos theory (Reiss & Vermeer, 1996) establishes a link between a trans-
lations quality and its adequacy for purpose, it can be asserted that a quality translation
has to be functionally, pragmatically and textually adequate (Colina, 2009; Nord, 1997,
p. 137). Likewise, error denition and error typology in the FCA have to take into
account the relative and functional value of error within the situational context
(Nord, 1997, p. 73). Consequently, this approach integrates the traditional, quantitative
bottom-up approach with a new, qualitative top-down approach that allows for the con-
sideration of suprasegmental issues (Waddington, 2000, p. 394). While these two
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

elements start from opposite ends of textual analysis, by providing them with functional
quality criteria, they each become part of a qualitative-quantitative methodological con-
tinuum (Orozco, 2001, p. 102).
This functional and componential approach to TQA materializes in the MAP, a tool
comprised of two modules. The qualitative module is a four-dimensional assessment
rubric (see Section 3.3.1), in which the dimensions constitute the construct used by
the FCA to describe a good translation, relating it to the functional notion of adequacy.
Every dimension has several descriptors with associated points (see Appendix 1). The
metric module, with a calculator interface, includes an error typology with allotted
points. Therefore the MAP offers two quality indicators of the text, which provide a
comprehensive approach to translation from a macrotextual and microtextual
viewpoint.

3. Methods and materials


Any research on translation quality quantication must adopt an empirical approach
(Rothe-Neves, 2002, p. 114). Hence, a preliminary pilot study based on a corpus
(Orozco, 2001, p. 107) and all the materials needed to implement it in the DGT were
designed.

3.1. Pretest-posttest experimental design


The pretest-posttest (Sans, 2004, p. 185) or repeated measurement (Neunzig, 2002, p. 85;
Wimmer, 2011, p. 170) is the study type that best ts the experimental conditions. Here,
the pretest is the QAT test that was done by the Spanish Department of the DGT in real
working conditions in 2009. Its aim was to determine the reliability and applicability of the
QAT for reviewing freelance translations in the Spanish Department. The posttest is the
principal contribution to this study, as it attempts to empirically validate the conceptual
(FCA) and the methodological (MAP) improvements made to the QAT. It was conducted
in 2013 and its aim was to analyze and assess the changes that occurred between the
pretest and the posttest, in order to check whether the modied tool had reached the
expected results. Specically, two hypotheses were posed to validate the use of the MAP
in the DGT:

(1) Using a TQA tool that combines a qualitative and quantitative approach allows for the
provision of a more comprehensive view of a quality translation.
(2) Using the MAP for the TQA of the DGTs outsourced translations improves inter-
and/or intra-rater reliability.
6 R. M. MATEO ET AL.

3.2. Corpus and respondents


For the pretest and the posttest, a corpus of text samples was compiled (Bowker, 2001;
Corpas, 2001): a primary corpus (pretest) and a secondary corpus (posttest). Furthermore,
a group of respondents participated in processing each corpus according to specic
instructions.

3.2.1. Primary corpus and respondents


Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

To check the QATs reliability, the Spanish Department compiled 30 translations


(EnglishSpanish) from 15 different subjects commonly dealt with by the EC (European
Commission).6 These texts were chosen from those that required evaluation at that time,
making up a parallel, specialized, multilingual and ad hoc corpus (Corpas, 2001, pp. 158
165).
The respondents were freelance translators (who provided the translations) and in-
house translators (the reviewers). Access to condential data, such as the identity and
characteristics of freelancers, was not granted. The reviewers, all European civil servants,
were 18 in-house translators. The test consisted of the same translator evaluating one
translation with both the traditional method (assessment template, Appendix 2) and
with the QAT. Ten reviewers evaluated one text, four evaluated two texts and the four
remaining reviewers evaluated three texts each.7

3.2.2. Secondary corpus and respondents


In order to experiment with the MAP, a subcorpus was compiled from the primary corpus.
The 30 original items were reduced to six due to reasons beyond the researchers control.
The workload of the translator-reviewers who participated in the pretest prevented them
from taking part in the posttest. Instead, the Spanish Department management kindly
offered the cooperation of two members from the Quality and Coordination Group
(QCG). This unexpected event compelled us to validate these two new experts to make
sure they were on the same level as the respondents of the QAT test in 2009 (see
Section 3.5.2). For this purpose, the primary corpus underwent a non-random ltering
process following a triple representativeness criterion. The texts selected for the posttest
(secondary corpus) either belonged to different subjects, had obtained coinciding marks
in the pretest or had obtained differing marks in the pretest. In addition, as marks obtained
with the traditional method and with the QAT in the pretest coincided in roughly one-
third of the cases, a similar proportion was maintained in the secondary corpus. This l-
tering process allowed for the selection of six translations characterized by their authen-
ticity, representativity and specicity (Sinclair, 1991), which are features required by
every corpus.
The value of these texts is their authenticity, since they were not deliberately made for
this purpose (Bowker & Pearson, 2002, p. 9). They are real translations, produced by pro-
fessional freelance translators and evaluated according to set criteria, which highlights
their great ecological validity (Orozco, 2001, p. 100). They are representative texts as
this concept is now related to the tness for purpose notion (Varantola, 2002, p. 174).
The secondary corpus is appropriate for a study on TQA at the DGT because it contains
texts that are genuinely assessed in that context. Therefore, the specicity of the secondary
corpus lies in its professional prole, its purpose and its homogeneity (Corpas, 2001,
PERSPECTIVES 7

p. 157). As it is an ad hoc corpus, designed in advance to fulll some specic purposes


(Bowker, 2001, p. 349), it is justiably unbalanced and of limited size, although extremely
homogeneous (Corpas, 2001, p. 164) (see Appendix 3). It is therefore an adequate corpus
for carrying out this preliminary pilot study with the MAP.
As for the group of respondents, it includes freelance translators from the pretest and
two translator-reviewers from the Spanish Department, members of the Coordination
and Quality Group (CQG). The latter are highly experienced translators who are respon-
sible for dealing with terminology requests, maintaining the IATE8 database and the Gua
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

del Departamento de Lengua Espaola (the Spanish Departments Guide), organizing in-
house training and coordinating all quality-related issues (including analyzing the linguis-
tic quality of translations, both internally and externally). The extensive experience of these
two members, in addition to their wide range of functions and their valuable experience
with TQA, allow them to be considered as internal experts. These respondents took part
in an assessment at the end of 2013, in which they evaluated the secondary corpus with
the help of the new MAP tool. Moreover, they completed a questionnaire aimed at gather-
ing information on three topics: the respondents prole, the corpus and the MAP tool.

3.3. Tools
The posttest aims to empirically validate the MAP and its two modules. However, within
the framework of this study, the MAP also serves a second function: that of a collection
tool that gathers information via the questionnaire.
The principal contribution of the MAP to TQA is its functional-componential rubric. It
is based on the analysis and improvement of two textual and functional approaches:
Colinas (2008, 2009) model and the ATA rubric (v. 2011). The Funtional Componential
Approach embeds Nords functional view of translation quality into the four dimensions
in which the quality contruct is broken down and thus it is also integral part of the MAP.

3.3.1. The MAP qualitative module


The qualitative module separates the quality construct into four smaller parts: the dimen-
sions. The rst dimension, Functional, pragmatic and textual adequacy, measures the TT
adequacy in relation to its aim, dened by the assignment specications and the needs of
the target audience. The FCA regards the functional issues as those textual features that
help the TT to fulll its intended function within the context of its reception. Pragmatic
issues refer to all extralinguistic elements that characterize any communicative setting, and
determine the senders (author/translator) message and the target audiences (reader)
interpretation (Escandell, 1996, p. 14). Hence, these features will not be evaluated in the
formal terms of correct and incorrect, but instead in the pragmatic terms of adequate
or inadequate (Escandell, 1996, p. 29), according to the intended function of the text
and the target reader (Nord, 1997, p. 35).
The second dimension, Specialized lexical units and content adequacy, refers to the
TTs conveyance of specialized lexical units and content in an adequate and coherent
manner. From a cognitive and pragmatic viewpoint (Len, Faber, & Montero-Martnez,
2011), it is understood that expert knowledge conceptualizes reality through categoriz-
ation structures typical of specialized domains. Consequently, the distinction between
general and specialized language (in varying degrees of specicity) is subject to cognitive
8 R. M. MATEO ET AL.

and pragmatic criteria. In fact, there is no clear division between them; rather, they are
adjoining and overlapping realities (Montero-Martnez, Faber, & Buenda, 2011, p. 93).
The third dimension, Non-specialized lexical units and content adequacy, describes
the TTs conveyance of non-specialized lexical units and content in an adequate and
coherent manner. Therefore, the transferred knowledge corresponds to a basic categoriz-
ation of the world, and is verbalized through lexical units with non-specialized semantic
features (Montero-Martnez et al., 2011, p. 22). This adequate use of language includes
compliance with the language usage norms of the TT speaker community.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

The last dimension, Normative and stylistic adequacy, focuses on the observance of
grammar, spelling and punctuation rules in the TT and the use of an adequate style,
bearing in mind the aim and the target audience. As Malinowski (1923) states, texts
have to be understood in relation to their context of culture (genre), but also in relation
to their context of situation (register). Thus texts, in relation to the extratextual
(Gmez, 2006, p. 422) and intratextual situations, have to be clear, precise and concise.
Three factors are considered when studying the relationship between language and the
specic communicative situation in which it is used: eld, tone and mode (Martin,
1992). Field refers to the subject dealt with by the Directorates-General of the EC; tone
refers to the protagonists of communication, who, in this text type, are a writer expert
in the eld and a general reader, with varying degrees of knowledge; and mode refers to
the communication channel, which in this case is written.
The assessment rubric is a table in which the dimensions (columns) and the levels of
mastery (rows) intersect with the descriptors (cells) (Table 2). The descriptors dene the
concept alluded to by each dimension, drawing a quality continuum from the highest to
the lowest level of adequacy of the described feature. In addition, points are allotted to the
grades obtained in order for them to be operative (Mossop, 2007, p. 184). The thicker
horizontal line in Table 2 shows the acceptability threshold between the pass and fail cat-
egories. In the MAP, each descriptor is associated with a number of allotted points that
will be added up to arrive at the nal count. These points vary depending on the rank of
each dimension in the order of preference (Table 2).

3.3.2. The MAP quantitative module


The quantitative module is a new version of the QAT, redesigned and adjusted to correct
its detected weaknesses and provide it with a functional conceptual foundation. First, the
error typology is based on that used in the QAT, due to its long tradition within the DGT
and to an analysis of the most renowned quantitative evaluation systems, which revealed a
strong accord with this typology (Martnez-Mateo, 2014b, p. 84). All these justied further
experimentation. As for error seriousness, the distinction between high and low errors is
preserved. The former cause the reader to interpret the text in a different way from that
which is intended, or prevent the reader from clearly understanding the message.
However, the MAP includes a new error type (Addition (AD)) to complement errors
related to accuracy and clarity.
Second, the FCA establishes a correspondence between the qualitative module dimen-
sions and the quantitative module error typology. The functionalist t-for-purpose view,
which is embodied in Nords (2009) error typology, acts as a hinge between the top-down
and bottom-up approaches of the FCA, creating a methodological continuum in which
pragmatic, cultural and linguistic errors9 (Nord, 2009, p. 238) serve as a reference to
PERSPECTIVES 9

Table 2. Qualitative module: assessment rubric.


Functional, pragmatic Non-specialized lexical
and textual adequacy Specialized lexical units units and content Normative and stylistic
textual and content adequacy adequacy adequacy
Very good The TT is adequate from The TT conveys The TT conveys non- The TT abides by the
the functional, specialized lexical specialized lexical grammar, spelling and
pragmatic and textual units and context in a units and content in punctuation rules of
viewpoint and in correct, coherent and a correct, coherent the target language
relation with its usage, adequate manner. and adequate and employs an
conditioned by the manner. adequate style,
assignment bearing in mind the
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

specications and the purpose and the


needs of target target audience.
audience.
Good The TT is close to the The TT may contain The TT may contain The TT contains (or may
purpose, usage, some minor some minor contain) some
assignment inadequacy(ies) inadequacy(ies) grammar, spelling or
specications and regarding the regarding the punctuation error(s)
needs of the target conveyance of conveyance of non- and/or contains some
audience, although it specialized lexical specialized lexical minor style
may require some units and/or content, units and/or content, inadequacy(ies) in TT
minor changes. bearing in mind the bearing in mind the for the purpose and
context and the context and the the target audience.
purpose of the TT. purpose of the TT.
Acceptable The TT sufciently The TT contains some The TT contains some The TT contains some
complies with the inadequacies inadequacies grammar, spelling
purpose, purported regarding the regarding the and/or punctuation
usage, assignment conveyance of conveyance of non- error(s) and/or
specications and specialized lexical specialized lexical contains some style
needs of target units and/or content, units and/or content, inadequacy(ies) that
audience, although it bearing in mind the bearing in mind the do not compromise its
requires some changes. context of the TT, context of the TT, purpose for the target
although it although it audience.
sufciently complies sufciently complies
with its purpose. with its purpose.
Below The TT observes the The TT contains some The TT contains some The TT contains some
average purpose, usage, minor inadequacies minor inadequacies grammar, spelling
assignment or one major or one major and/or punctuation
specications and inadequacy inadequacy error(s) and/or
needs of the target regarding the regarding the contains some minor
audience, although it conveyance of conveyance of non- inadequacies or one
contains several minor specialized lexical specialized lexical major style
inadequacies or one units and/or content, units and/or content, inadequacy that
major inadequacy that bearing in mind the bearing in mind the impair the correct
require(s) important context and the context and the conveyance of the
amendments. purpose of the TT, purpose of the TT, message for the
which impairs its which impairs its purpose and the
utility as a utility as a target audience.
translation. translation.
Unacceptable The TT does not observe The TT contains some The TT contains some The TT contains
the purpose, usage, minor inadequacies minor inadequacies grammar, spelling
assignment or one major or one major and/or punctuation
specications and inadequacy inadequacy error(s) and/or
needs of the target regarding the regarding the contains style
audience. It contains conveyance of conveyance of non- inadequacy(ies)
major inadequacies. specialized lexical specialized lexical unacceptable for the
units and/or content, units and/or content, purpose and the
bearing in mind the bearing in mind the target audience.
context and the context and the
purpose of the TT. purpose of the TT.
10 R. M. MATEO ET AL.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Figure 1. Qualitative and quantitative continuum in the FCA.

integrate quantitative and qualitative aspects. Figure 1, from the left, links, one by one,
each dimension of the FCA to the two macroerror types (pragmatic and linguistic).
From the right, analogous links between the QATs eight error types and Nords functional
errors10 are established.
Third, the reviewers identied errors by comparing the source and the target texts
according to the requirements of the assignment, usage conditions and target-culture con-
ventions (Nord, 2009, p. 237). Thus, the reviewer will determine whether an error is lin-
guistic or pragmatic (functional); that is to say, he/she will assess the error according to its
impact on the function of the text, bearing in mind the communicative effect on the reader
(Kussmaul, 1995, p. 132).
Fourth, the QATs textual proles have been replaced with the DGTs textual binary
classication, which categorizes texts into two types (Quality Control 1 (QC1), Quality
Control 2 (QC2)) according to the quality control level they undergo. These control
levels depend on the aim and quality requirements of the TT. The reviewer simply has
to look up the text type in the lists found in the Spanish Department of the DGTs Revision
Manual.11
Fifth, as one of the greatest deciencies of quantitative models has to do with the varia-
bility in error tagging once detected, the meta-rules12 of the SAE J2450 model are incor-
porated in order to standardize error tagging. Furthermore, a preference order of errors
(Martnez-Mateo, 2014a, p. 256) is set up according to textual prole. This follows a
top-down hierarchy. Figures 2 and 3 show the order of preference that, when in doubt,
the reviewer will follow to tag errors. Errors are ordered from the highest to the lowest
with decreasing penalty values.
PERSPECTIVES 11
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Figure 2. Error type order of QC1 prole.

Figure 3. Error type order of QC2 prole.

Thus, when in doubt about how to tag an error using the MAP qualitative module, the
reviewer will choose: pragmatic over linguistic; the error type (SENS, TERM, etc.) accord-
ing to the order of preference; and high over low.

3.3.3. Questionnaire
The questionnaire was designed to take into account the objectives and the empirical vari-
ables of this study, and is therefore structured into three content blocks.13 The rst block
(Sections IIII) focuses on the academic and professional prole of the respondents. The
second block (Section IV) looks into the theoretical underpinnings of the corpus regarding
its adequacy, representativeness and sample extraction. The third block (Sections VVI)
deals with the MAP. Specically, it poses questions regarding the qualitative module,
the relevance of the dimensions, the clarity of the descriptors and the suitability of the
scores. As for the quantitative module, the new text classication is assessed, as is the
appropriateness of the error typology and the associated weights. The last three questions
ask the respondent for an overall assessment of the MAP.
As far as the design is concerned, the funnel technique was used, i.e. the questionnaire
goes from the general to the particular. In order to allow for a wide range of answers,
open, dichotomous and polychotomous choice questions were employed. The language
used is clear and simple, in an attempt to create exclusive, unambiguous questions. The suit-
ability of the questionnaire was subsequently validated by an expert on Methods of Research
12 R. M. MATEO ET AL.

and Diagnostics in Education from the University of Castilla La Mancha (Spain) and by
three experts on Translation and Interpreting Quality Assessment from the University of
Granada (Spain) with the help of a validation guide (Martnez-Mateo, 2014a, pp. 392399).

3.4. The pilot study


In order to carry out the pilot study (posttest), the test of the MAP tool on the secondary
corpus, the respondents received all the necessary materials in an electronic folder called
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Dossier. The folder contained instructions to guide the reviewer in the process, the corpus,
the MAP tool and the questionnaire.

3.4.1. Procedure
Efciency and efcacy governed all the decisions regarding the empirical development of
the test. Thus, the initial plan of holding an on-site training session with the two partici-
pants was discarded due to their time limitations. Instead, they were contacted via email.
Before commencing the study, they knew nothing about the test. Since these two members
of the QCG had not participated in the pretest, rst of all they had to be validated as suit-
able candidates for the posttest (see Section 3.4.2). Then came the actual test. It was
intended as an empirical validation of the conceptual and methodological improvements
of the QAT, which aimed to improve the MAP. To carry out the process, a two-week dead-
line was agreed with the participants. In a second round of emails, we sent the participants
a dossier that contained two Word les and three folders. One Word le described the
general framework of the study and included all the necessary information to complete
the test. The rst folder included the secondary corpus texts. The second folder comprised
the MAP tool, with its two modules, and instructions on how to use them. Here the theor-
etical approach, the functions available and the customization capacity of MAP were sum-
marized. As for the new qualitative module, a read through the assessment rubric and its
brief instructions was sufcient for them to learn to use it. The third folder was composed
of two subfolders: one for storing the assessment reports of the qualitative module and
another for the quantitative module reports. The last Word le was the questionnaire
that was to be completed as a conclusion.
The most important information contained in the dossier was the implementation of
the MAP, although the participants extensive knowledge of the QAT facilitated this
task. Nonetheless, a more detailed step-by-step practical explanation of the procedure
they were provided with follows.
Using the Track Changes feature in Word, reviewers had to mark the errors in the
selected sample. First, they evaluated the texts with the help of the MAP qualitative
module. For that purpose, they followed the order of preference of the dimensions in
the assessment rubric, from left to right. Next, they registered the mark obtained in
each dimension in the le Assessment summaries (see Appendix 1), the nal count of
which appears in the Final mark column. Then they evaluated the texts according to
the MAP quantitative module (Figure 4). First, they had to choose the text Prole
(QC1 or QC2) and select the page number (Pages). Next, they compared the original
text (OT) and TT to pinpoint errors and tag them according to their nature, type and
severity. To calculate the Final grade, the values of both modules are added together
and then divided by two (MAP Assessment summary in Appendix 1). That result is
PERSPECTIVES 13
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Figure 4. MAP interface. European Union 2015.

recorded in the Recommendation (Appendix 1) table, and corresponds to the tools rec-
ommendation for that particular translation.

3.4.2. Experimental restrictions


Firstly, as stated in Section 2, the participants of this exploratory study work for the DGT
of the European Commission, a high-pressure work environment. Owing to that, and
despite their willingness to collaborate, their availability was limited. The workload of
the professional translator-reviewers who participated in the pretest prevented them
from taking part in the posttest. Instead, the Spanish Department management kindly
offered cooperation from the QCG, made up of two members. This unexpected event
compelled us to shorten the primary corpus used in the pretest (see Section 3.2.2) and
to validate these two new experts so as to make sure that they were as equally qualied
as the respondents of the QAT test in 2009.
Therefore, and for the sake of effectiveness, the respondents (hereinafter reviewer A and
reviewer B) were asked to assess the six texts of the secondary corpus (extracted from the
primary corpus) with the QAT. The following premise was considered: if reviewers A and
B got similar marks (that is to say, within the same mark range according to QATs quality
assessment scheme, Table 1) to those obtained by the pretest respondents in the assessed
texts, they would be validated to perform in the posttest in the same conditions as if they
had taken part in the pretest. The comparison of the results showed total coincidence,
which allowed for their validation as suitable respondents for the posttest (Martnez-
Mateo, 2014a, pp. 298299). The respondents completed the evaluation of the secondary
corpus and completed the questionnaire. Unfortunately, in such demanding circum-
stances, there was no time for a face-to-face training session of the tool, which would
have solved many of the inquiries subsequently posed by the respondents. Thus, the a
14 R. M. MATEO ET AL.

priori and experimental restrictions of this research conditioned its development and the
respondents perception.

4. Results and discussion


The results gathered in the rst two measurements of the pretest-posttest (2009 and 2013)
allow some conclusions to be drawn regarding the concurrence or divergence of marks
obtained with the DGT traditional methodology and the QAT, and then with the MAP.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

4.1. Pretest results


The pretest results correspond to the TQA of the primary corpus texts (30 translations)
applying the Traditional method and the QAT. With the Traditional method, 27 trans-
lations passed the acceptability threshold and three failed. With the QAT, only 20 trans-
lations passed, which denotes that assessment with the QAT produced lower marks than
with the Traditional method (Martnez-Mateo, 2014a, p. 312).
The marks given by the same reviewer using the two methods mentioned above (intra-
rater assessment) were gathered in order to calculate the degree of agreement or disagree-
ment of the QAT (intra-rater reliability). The marks agreed in 13 and disagreed in 17 texts;
that is, the intra-rater general agreement percentage was 39.81%.14
As can be seen in Table 3, while some raters assessed only one text (10 raters), others
assessed two or three texts (eight raters). From this, no regular pattern can be identied, as
some raters who assessed only one text got 100% agreement, while others 100% disagree-
ment and, simultaneously, raters who assessed two or three texts also obtained varying
degrees of coincidence (from 0 to 100%).
If we look at the distribution of similar marks per section, Figure 5 shows that the
largest accumulation (11/13) was in the two higher marks (Very good, Good), while
the three other mark sections (Acceptable, Below average, Unacceptable) obtained
fewer occurrences (2/13). The heterogeneity of these data is evidenced by the uneven dis-
tribution of scores, particularly in the mid-range values.

Table 3. Intra-rater assessment results (pretest).


Rater (R) N. of texts with coinciding grades N. of texts with differing grades % coincidence per R
R1 3 3/3 (100)
R2 3 0/3 (0)
R3 1 2 1/3 (33.3)
R4 1 0/1 (0)
R5 1 1/1 (100)
R6 1 1/1 (100)
R7 1 0/1 (0)
R8 1 1 1/2 (50)
R9 1 0/1 (0)
R10 1 1/1 (100)
R11 1 2 1/3 (33.3)
R12 1 0/1 (0)
R13 1 0/1 (0)
R14 1 1 1/2 (50)
R15 1 0/1 (0)
R16 2 2/2 (100)
R17 1 1 1/2 (50)
R18 1 0/1 (0)
Totals 13 coincidences 17 differences 39.81% coincidence
PERSPECTIVES 15
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Figure 5. Distribution of coinciding marks (pretest).

From a gender viewpoint, the data collected show that, on average, female intra-rater
reliability was Acceptable (66.6%) and male intra-rater reliability was Below average (27.3%).
According to these data, overall intra-rater reliability can be regarded as being low, with
an agreement of 43% and a slightly higher divergence of 57%. Nonetheless, reliability is
inconclusive as these are mid-range values.
It is also worth noting that in the case of the Spanish Department, the QAT was
implemented following the standard guidelines because, due to the nature of their in-
house staff, it is one of the Linguistic Departments of the DGT that outsources the least
amount of translations. Hence, they used all the preset values and did not make any adjust-
ments. This fact aws the QATs value, for, as the developers of QAT stated, its greatest
virtue lies in its customization capacity.

4.2. Posttest results


The results of the posttest are the grades obtained by reviewers A and B in the TQA of the
secondary corpus by means of the MAP, together with the results of the questionnaire. In
addition, these results were compared with the grades obtained by the same reviewers
using the QAT and the Traditional method in the validation phase (see Section 3.4.2)
in order to draw conclusions about intra-rater reliability and inter-rater reliability.

4.2.1. Quantitative and qualitative module results


Table 4 shows the results obtained by reviewers A and B concerning the texts of the sec-
ondary corpus. The columns show the quantitative and qualitative modules, as well as the
nal MAP grade.
As for the quantitative module, the marks obtained by reviewers A and B on the same
text concur in four texts (3, 4, 5 and 6) and differ in two texts (1 and 2). This highlights the
need to complement the microtextual approach.
Regarding the errors labeled during the test (Appendix 4), reviewers A and B used the
new MAP function to tag mistakes according to their nature. For example, error 23 (Omis-
sion, Pragmatic, High; Table 5) clearly shows how this inadequacy is not the result of
breaking linguistic rules but of the peculiarities of the situational context. It should be
noted that the space reserved for comments is limited.15
Likewise, to identify error 28 (Omission, Pragmatic, Low; Table 6), it is necessary to
know the purpose of the mentioned component. This information cannot merely be
derived from the linguistic content but requires additional contextual information.
16 R. M. MATEO ET AL.

Table 4. Grades obtained with MAP (posttest).


MAP results
Qualitative module Quantitative module Final grade
Text 1, Rater A 90 86 88
Very good
Text 1, Rater B 90 79 84.5
Good
Text 2, Rater A 75 50 62.5
Acceptable
Text 2, Rater B 75 60 67.5
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Acceptable
Text 3, Rater A 50 45 47.5
Below average
Text 3, Rater B 50 52.5 51.25
Below average
Text 4, Rater A 90 94 92
Very good
Text 4, Rater B 90 94 92
Very good
Text 5, Rater A 50 50 50
Below average
Text 5, Rater B 50 42.5 46.25
Below average
Text 6, Rater A 75 61.6 68.3
Acceptable
Text 6, Rater B 75 61.6 68.3
Acceptable

Table 5. Error 23.


Text and Freelance Error type, nature and
rater Original text translation Rater translation seriousness
Text 5 A Invalid Comentario no Comentario demasiado extenso: no se OM P high
comment vlido. aceptan ms de 255 caracteres.
(max. 255 [Invalid [Comment too long: not more than 255
chars) comment] chars. will be accepted]

Table 6. Error 28.


Text Error type,
and nature and
rater Original text Freelance translation Rater translation seriousness
Text 6 The EGNOS components Los miembros del EOIG, o las Los miembros del EOIG, o las OM P low
A located in the EOIG entidades que stos entidades que stos
hosting sites shall be designen, explotarn los designen, explotarn
operated by the relevant componentes de EGNOS comercialmente los
EOIG Members on a [The EGNOS members, or the componentes de EGNOS
commercial basis entities designated by them, [The EGNOS members, or the
will operate the EGNOS entities designated by them,
components ] will perform commercial
operations on the EGNOS
components ]

Regrettably, there were no instances of the new error type, AD, so no conclusions can be
reached.
Regarding the results obtained with the qualitative module, reviewers agreed in all
cases. An example of this can be found in Table 7, which shows that, in texts 1 and 3,
reviewers have ticked the same descriptors for each dimension.
PERSPECTIVES 17

Table 7. Descriptors ticked in texts 1 and 3.


Qualitative module
Text 1 Text 3
Rater A Rater B Rater A Rater B
Dimension 1 30 30 15 15
Dimension 2 20 20 15 15
Dimension 3 25 25 15 15
Dimension 4 15 15 5 5
Grade 90 90 50 50
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

4.2.1.1. Intra-rater reliability. The intra-rater reliability of the posttest is analyzed by com-
paring the results obtained by each reviewer (A and B), using comparable tools (intra-rater
assessment): the QAT in the validation phase and the MAP quantitative module in the posttest.
In the validation phase, as Table 8 shows, scores obtained by reviewers A and B with the
help of the QAT and the MAP qualitative module agree in three texts and differ in the
other three (Martnez-Mateo, 2014a, p. 312).

Table 8. Intra-rater evaluation with QAT and MAP (quantitative module).


RATER A RATER B
QAT in validation MAP quantitative module QAT in validation MAP quantitative module
Text 1 79 86 70 79
Text 2 40 50 40 60
Text 3 30 45 37.5 52.5
Text 4 88 94 94 94
Text 5 40 50 40 42.5
Text 6 73.3 61.6 73.3 61.6

These data show a low intra-rater reliability of both reviewers using the QAT and the
MAP quantitative module, as 50% is considered to be Below average.

4.2.1.2. Inter-rater reliability. The inter-rater reliability of reviewers A and B was analyzed
in four cases.
First, we compare the marks obtained by both reviewers when assessing the same text
with comparable tools. More precisely, these scores were obtained with the MAP quanti-
tative module and the results gathered with the QAT in the expert validation phase.
Table 9 shows that texts 1, 2, 4 and 5 (66.6%) had similar marks and texts 3 and 6 had
differing marks (Martnez-Mateo, 2014a, p. 324). All these data conrm that the quanti-
tative-based tools tested produce an acceptable inter-rater reliability (66.6%).

Table 9. Grades obtained with QAT and MAP quantitative modules (Inter-rater).
Experts validation QAT Posttest MAP quantitative module
Rater A Rater B Rater A Rater B
Text 1 79 70 86 79
Text 2 40 40 50 60
Text 3 30 37.5 45 52.5
Text 4 94 94 94 94
Text 5 40 40 50 42.5
Text 6 73.3 73.3 61.6 61.6
18 R. M. MATEO ET AL.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Figure 6. Distribution of grades obtained with quantitative-based tools.

However, the distribution of marks in a bar graph (Figure 6) does not show any regular
pattern. It is only worth mentioning that the lowest marks (text 3) present a greater graph
dispersion and that the highest marks (text 4) appear closer.
Second, when comparing the marks obtained by reviewers A and B using the MAP
qualitative module (Table 7), they agreed in all cases. Additionally, the reviewers ticked
the same descriptors for each dimension (Martnez-Mateo, 2014a, p. 325).
Third, the marks obtained using the MAP quantitative module agree in four of six texts
(66.6%; Table 9), which points to an acceptable inter-rater reliability.
Fourth, the nal marks given by reviewers A and B using the MAP agreed in ve out of
six texts (83.3%; Table 4). In other words, inter-rater reliability is between good and very
good; besides, in the remaining text, the difference is a mere 0.5 points (Martnez-Mateo,
2014a, pp. 325326).

4.2.2. Questionnaire results


In the last step of the posttest, the respondents completed a questionnaire made up of three
sections. The rst section consisted of questions about the respondents training and qua-
lications. The second dealt with the corpus: both respondents agreed that it was repre-
sentative and that the selection method of the evaluation sample was adequate.
Regarding the third section, centered on the MAP modules, respondent A raised questions
about the utility of the qualitative module and respondent B raised doubts about the
weight of the descriptors. This reects indifference to the use of the tool (position 3 in
a 5-point Likert scale). When questioned about the two new text types (QC1 and QC2),
both respondents agreed that it had been a wise choice. As for the correspondence estab-
lished between the qualitative module dimensions and the error groups of the quantitative
module, respondents agreed on the suitability of two of the alignments and disagreed on
the suitability of the other two (Dimension 1 with all error types and Dimension 3 with
GR, SP and PT errors). Overall, their assessment of the tool in the pilot study was Some-
what dissatised (position 2 in a 5-point Likert scale).

4.3. Discussion
In the pretest, overall intra-rater reliability is low (39.81%), as evidenced by the two indi-
cators stemming from the evaluations with the Traditional methodology and the QAT.
The agreement rate is low (43%, Unacceptable) and the divergence rate is acceptable
PERSPECTIVES 19

(57%). Also, the QAT produced lower marks than the Traditional methodology. These
poor values are due to the fact that the QAT used default settings that penalize low
errors with three points and high errors with ten, disregarding any other variables. In
addition, the analysis of errors tagged with the qualitative tools demonstrates that the
reviewers identied different errors in the same translation and, even when they did
detect the same error, they sometimes tagged it differently, which brought about different
scores.
In response to the aforementioned deciencies, two of the material amendments
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

inserted in the MAP quantitative module were: the distinction between pragmatic and lin-
guistic errors and the setting of an order of errors per text type (applicable if in doubt), and
the application of a point-deduction scheme according to text type, nature and seriousness
of error.
Another conclusion that can be drawn is that, out of the 13 translations with similar
marks (Figure 5), 11 received Good and Very good marks, and only two received
mid-range marks bordering the acceptability threshold. This fact corroborates the initial
assumption that the marks in the mid-range of the quality continuum gave way to
varied opinions, whereas the Good and Very good marks generated higher levels of con-
sensus amongst raters (Martnez-Mateo, 2014a, p. 314).
Finally, and from a gender perspective, results show higher female intra-rater reliability
(66.6%); it would therefore have been desirable to have a larger number of female respon-
dents in order to explore the reasons for those discrepancies.
In the posttest, when comparing the results obtained by the reviewers with the QAT in
the validation phase and those obtained with the MAP quantitative module, they reveal
the low intra-rater reliability of the quantitative-based tools (which only reach a 50%
agreement per reviewer; Table 8). This poor percentage of agreement highlights the quan-
titative-based tools deciencies in TQA, since they only provide a partial, microtextual
view of quality, based only on the penalization of linguistic errors. In consequence, this
biased view should be complemented with a top-down approach.
The evaluation of intra-rater reliability has also been based on the results obtained with
the MAP as a whole, and with the qualitative and quantitative modules separately. The
total coincidence of the results obtained by raters A and B with the MAP qualitative
module, not only in the nal score but also in every one of the descriptors ticked per
dimension (Table 4), underlines the potential value of the TQA model presented in this
study. This module provides the reviewer with a reference framework that facilitates
decision making according to a limited, practical, known, transparent and customizable
set of criteria, restricting the subjectivity of those decisions and enhancing interobjectivity.
Marks obtained by raters A and B with the quantitative module coincide in four out of
six texts (Table 9), and disagreeing marks only vary by one point, which causes a change of
quality range. The rise in intra-rater reliability with the MAP quantitative module, in com-
parison to the QAT, indicates the positive effect of the improvements.
The comprehensive results of the reviewers of the posttest with the MAP offer high
intra-rater reliability (Table 4). This seems to prove that the inclusion of the qualitative
module in the MAP contributes to the provision of a more holistic view of the analyzed
text.
Finally, the opinions expressed by the reviewers in the questionnaire provided positive
and negative feedback, depending on the issue. Based on the analysis of the respondents
20 R. M. MATEO ET AL.

questionnaires, the training procedure used in this study seems to have some aws regard-
ing the full understanding of the MAPs functionality. This stresses the need for an in-
person training session to present in full the theoretical underpinnings and capacity of
the tool. This session could take between one and two hours for reviewers with previous
knowledge of TQA aid tools.
In addition, the usefulness of the new error type (AD), as well as the possibility of
inserting another error type in reference to internal coherence, will be evaluated, following
the recommendations provided by the respondents. The specication of an error typology
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

has long been a controversial issue, although there are several proposals, from the early
proposals of Kupsch-Losereit (1985), Gouadec (1989), Nord (1997) and Mossop (2007)
to recent ones by Jimnez-Crespo (2009, 2011) and OBrien (2012). Nevertheless, far
from being a settled issue, it still generates heated debate, which calls for more empirical
testing on the previous proposals.

4.4. The MAP tool in other professional contexts


As pointed out above, the adaptation potential of the MAP is one of its greatest strengths.
The underlying conceptual framework provided by the FCA bestows upon it a exible
componential interrelation of the items that comprise the two modules (the macrotextual
and microtextual approaches). The internal features of each module may be adjusted to a
concrete situation. The qualitative module splits the quality construct into four dimen-
sions and associates positive points to each dimension, while the quantitative module
has negative points allotted to the error types according to three criteria: error nature,
type and severity. Nonetheless, every parameter is customizable. The number, type and
nature of error may be subject to amendments, while the point allocation of each
module is liable to be ne-tuned according to the quality criteria demanded by a pro-
fessional institution. Besides, both modules have a preset internal order of preference of
its components (dimensions or error typology) that conditions the value of the bonus
points (qualitative) and the severity of the penalizing points (quantitative). These can
also be adjusted to specic needs.

5. Conclusions
Although we are aware of the shortcomings of this exploratory analysis, these do not
undermine the relevance of the ndings. Despite the limitations, the results obtained
allow for the corroboration of the rst hypothesis. The use of an assessment tool that com-
bines a qualitative with a quantitative approach, and is based on common quality criteria,
allows the reviewer to assess the translated text with a unied reference framework that
improves the interobjectivity and offers a more balanced view of the assessed text, as it
approaches the text from two necessary and complementary perspectives.
With regard to the second hypothesis that is, whether the MAP tool can be validated
as a reliable tool for the TQA of the Spanish Department of the DGTs outsourced trans-
lations the results of this pilot study are inconclusive. This has raised conceptual and
methodological considerations with regard to future improvements. The respondents
comments challenged some issues that need re-examination: the need to count on two sep-
arate dimensions to assess general and specialized language; the correspondence between
PERSPECTIVES 21

the qualitative module dimensions and the quantitative module errors; and the adjustment
of the weight of both modules.
Once improvements to the conceptual model and the MAP have been implemented, a
larger-scale study (with a larger corpus and a greater number of reviewers) should provide
stronger evidence of the conceptual and methodological validity of the tool. Only then
could the tool be adjusted to other linguistic combinations within the institutional
context of the DGT, or to other professional settings, given its excellent benetcost ratio.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Notes
1. For an extensive review of the characteristics, weaknesses and strengths of these TQA models
and tools, see Martnez-Mateo (2014a, 2014b).
2. It provides a customizable modular TQA system for the selected content types and quality
criteria, which allows for adaptability to client preferences.
3. For more information, visit the website: http://www.atanet.org/certication/aboutexams_
rubic.pdf
4. The Guide for External Translators aims to provide external contractors with information
about the procedure and the technical and quality requirements that externalized translations
must fulll.
5. This simple revision (Parra, 2005, p. 362) is based on known criteria and fullls summative
(issues an assessment), formative (teaches freelance translator from its errors) and corrective
(makes amendments) functions (Martnez & Hurtado, 2001, p. 277).
6. There are 36 subjects dealt with by the EC. For further information, visit http://europa.eu/
pol/index_en.htm
7. Due to space restrictions and to ease the subsequent comparison of results, only the data
related to the corpus, the respondents and the pretest are presented here. A complete descrip-
tion of all the test elements can be found in Martnez-Mateo (2014a, pp. 284292).
8. Inter-Active Terminology for Europe (IATE) is the EUs inter-institutional terminology
database. It is publicly available at http://iate.europa.eu/SearchByQueryLoad.do;jsessionid=
DTGPV9jX0sdhVVGvN2X8bVPNlyHVGLT1GsDKDzPjHZCmyLVn0MxN!1492297265?
method=load
9. For the purposes of this study, cultural errors are included in pragmatic ones, since the
former are inadequacies related to world knowledge, and are not inferred from linguistic
signs or rules alone (linguistics errors) in a straightforward manner (Martnez-Mateo,
2014a, p. 248).
10. An in-depth description of the correspondences established can be found in Martnez-Mateo
(2014a, pp. 202223).
11. Available on the website: http://ec.europa.eu/translation/spanish/guidelines/documents/
revision_manual_es.pdf
12. (1) when in doubt, always choose the earliest primary category; and (2) when in doubt,
always choose serious over minor.
13. The questionnaire itself is available upon request to the authors via email.
14. General coincidence percentage = total records/total participants.
15. The freelance and rater translations in the tables and in Appendix 4 have been glossed in
English.

Acknowledgments
We would like to thank the willingness and cooperation of the members of the Spanish Department
of the Directorate-General for Translation of the European Commission. We would like to thank
the Editor and the anonymous referees for their useful comments on an earlier version of this
article. We also thank Maria Baldarelli for the language editing of the text.
22 R. M. MATEO ET AL.

Disclosure statement
No potential conict of interest was reported by the authors.

Funding
This research is part of the project Cognitive and Neurological Bases for Terminology-enhanced
Translation (CONTENT) (FFI201452740-P), funded by the Spanish Ministry of Economy and
Competitivity.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Notes on contributors
Roberto Martnez Mateo holds a degrees in English Philology from the University of Val-
ladolid and another degree in Translation and Interpreting from the University of the
Basque Country. He gained his PhD in Translation and teaches English as a foreign
language, didactics and translation at the University of Castile La Mancha. His research
interests deal with translation quality assessment, communicative language skills and
translation as a teaching tool for Foreign Language Teaching (FLT). He has publications
in Journal of Language Teaching and research, Miscelanea: A Journal of English and Amer-
ican Studies and Ocnos.

Silvia Montero Martnez holds a degree in English Language and Literature and a M.A. in
Specialized Translation from the University of Valladolid. She has a PhD in Spanish Lin-
guistics. She lectures on Translation, Terminology and Translation Technologies at the
University of Granada. Her main research interests are terminology, specialized trans-
lation and knowledge engineering. She is the author of various books and chapters on
lexical semantics, translation and terminology. Her work has been published in several
international peer-reviewed journals, such as Terminology, Perspectives, META, Babel,
and Journal of Pragmatics.

A. Jess Moya Guijarro does research in Systemic Functional Linguistics and has pub-
lished several articles on information, thematicity and picture books in international jour-
nals such as Word, Text, Functions of Language, Journal of Pragmatics, Text and Talk,
Review of Cognitive Linguistics and Atlantis. He is co-editor of The World Told and The
World Shown: Multisemiotic Issues (2009, Palgrave Macmillan). He is also author of the
book, A Multimodal Analysis of Picture Books for Children. A Systemic Functional
Approach (2014, Equinox).

ORCID
Roberto Martnez Mateo http://orcid.org/0000-0001-7110-8789

References
Barber, E., & Martn, E. (2009). Portfolio electrnico: aprender a evaluar el aprendizaje. Barcelona:
Editorial UOC.
Bowker, L. (2001). Towards a methodology for exploiting specialized target Language corpora as trans-
lation resources. International Journal of Corpus Linguistics, 5(1), 1752. doi:10.1075/ijcl.5.1.03bow
PERSPECTIVES 23

Bowker, L., & Pearson, J. (2002). Working with specialized language. A practical guide to using
corpora. London: Routledge.
Colina, S. (2008). Translation quality evaluation: Empirical evidence for a functionalist approach.
The Translator, 14(1), 97134. doi:10.1080/13556509.2008.10799251
Colina, S. (2009). Further evidence for a functionalist approach to translation quality evaluation.
Target, 21(2), 235264. doi:10.1075/target.21.2.02col
Corpas, G. (2001). Compilacin de un corpus ad hoc para la enseanza de la traduccin inversa
especializada. TRANS. Revista de traductologa, 5, 155184. Retrieved from http://www.trans.
uma.es/Trans_5/t5_155184_GCorpas.pdf
De Rooze, B. (2006). La traduccin, contra reloj. Consecuencias de la presin por falta de tiempo en el
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

proceso de traduccin. (Unpublished Doctoral Dissertation). University of Granada, Spain.


Retrieved from: http://isg.urv.es/library/papers/DeRooze-DissDraft03.pdf
EC. (2009). Programme for Quality Management in Translation. 22 Actions. Retrieved from http://
ec.europa.eu/dgs/translation/publications/studies/quality_management_translation_en.pdf
EC. (2012). Quantifying quality costs and the cost of poor quality in translation: Studies on trans-
lation and multilingualism. Retrieved from http://ec.europa.eu/dgs/translation/publications/
studies/index_en.htm
Escandell, M. V. (1996). Introduccin a la pragmtica. Barcelona: Ariel Lingstica.
Gerzymisch-Arbogast, H. (2001). Equivalence parameters and evaluation. Meta: Journal des traduc-
teurs, 46(2), 227242. doi:10.7202/002886ar
Gmez Gonzlez-Jover, A. (2006). Terminografa, lenguajes profesionales e intermediacin
interlingstica. Aplicacin metodolgica al lxico especializado del sector industrial del
calzado y de las industrias anes. (Doctoral Dissertation) University of Alicante, Spain.
Retrieved from: http://rua.ua.es/dspace/bitstream/10045/760/1/tesis_doltoral_adelina_gomez.
pdf
Gouadec, D. (1981). Paramtres de levaluation des traductions. Meta: Journal des traducteurs, 26
(2), 99116. doi:10.7202/002949ar
Gouadec, D. (1989). Aspects mthodologiques de lvaluation de la qualit du travail en
interprtation simultae. Meta, 28(3), 236243.
Hnig, H. G. (1998). Positions, power and practice: Functionalist approaches and translation
quality assessment. In C. Schffner (Ed.), Translation and quality (pp. 634). Clevedon:
Multilingual Matters.
House, J. (1997). Translation quality assessment: A model revisited. Tbinguen, Germany: Gunter
Narr.
Jimnez-Crespo, M. A. (2009). The evaluation of pragmatic and functionalist aspects in localization:
Towards a holistic approach to Quality Assurance. The Journal of Internationalization and
Localization, 1, 6093. doi:10.1075/jial.1.03jim
Jimnez-Crespo, M. A. (2011). A corpus-based error typology: Towards a more objective approach
to measuring quality in localization. Perspectives, Studies in Translatology, 19(4), 315338. doi:10.
1080/0907676X.2011.615409
Kupsch-Losereit, S. (1985). The Problem of translation error evaluation. In C. Titford & A. E. Hieke
(Eds.), Translation in foreign language teaching and testing (pp. 169179). Gunter Narr:
Tbingen.
Kussmaul, P. (1995). Training the translator. Amsterdam: John Benjamins.
Larose, R. (1998). Mthodologie de lvaluation des traductions. [A method for assessing translation
quality]. Meta: Journal des traducteurs, 43(2), 16386. doi:10.7202/003410ar
Len, P., Faber, P., & Montero-Martnez, S. (2011). Special language semantics. In P. Faber (Ed.), A
cognitive linguistics view of terminology and specialized language (pp. 95176). Berlin: De Gruyter
Mouton.
LISA. (2007). Localisation industry standards association. Retrieved from http://www.lisa.org/LISA-
QA-Model-3-1.124.0.html
Malinowsky, B. (1923). The problem of meaning in Primitive Languages. Supplement I to C.K.
Ogden and I.A. Richards (Ed.), The meaning of meanings (pp. 296336). New York: Harcourt
Brace and World.
24 R. M. MATEO ET AL.

Martin, J. R. (1992). English text. System and structure. Amsterdam: John Benjamins.
Martnez, N., & Hurtado, A. (2001). Assessment in translation studies: Research needs. Meta:
Journal des traducteurs, 46(2), 272287. doi:10.7202/003624ar
Martnez-Mateo, R. (2014a). Propuesta de evaluacin de la calidad en la DGT de la Comisin
Europea: el modelo Funcional-Componencial y las traducciones externas ingls-espaol.
(Doctoral Dissertation) University of Castilla La Mancha, Spain. Retrieved from https://
ruidera.uclm.es/xmlui/handle/10578/4120
Martnez-Mateo, R. (2014b). A deeper look into metrics for Translation Quality Assessment
(TQA): A case study. Miscelanea: A Journal of English and American Studies, 49, 7393.
Montero-Martnez, S., Faber, P., & Buenda, M. (2011). Terminologa para traductores e intrpretes:
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Una perspectiva integradora. Granada: Ediciones Tragacanto.


Mossop, B. (2007). Revising and editing for translators. Manchester, UK: St. Jerome.
Neunzig, W. (2002). Estudios empricos en traduccin: apuntes metodolgicos. Cuadernos de
Traduo 10: 7596. Retrieved from http://www.cadernos.ufsc.br/
Nord, C. (1997). Translation as a purposeful activity. Manchester, UK: St. Jerome.
Nord, C. (2009). El funcionalismo en la enseanza de la traduccin. Mutatis Mutandis, 2, 209243.
OBrien, S. (2012). Towards a dynamic quality evaluation model for translation. Jostrans: The
Journal of Specialized Translation, 17. Retrieved from http://www.jostrans.org/issue17/art_
obrien.php
Orozco, M. (2001). Mtodos de investigacin en traduccin escrita: qu nos ofrece el mtodo
cientco? Sendebar, 12, 95115.
Parra, S. (2005). La revisin de traducciones en la Traductologa: Aproximacin a la prctica de la
revisin en el mbito profesional mediante el estudio de casos y propuestas de investigacin.
(Doctoral Dissertation) University of Granada, Spain. Retrieved from http://digibug.ugr.es/
handle/10481/660
Parra Galiano, S. (2007). Propuesta metodolgica para la revisin de traducciones: principios gen-
erales y parmetros. TRANS (Revista De Traductologa), 11, 197214.
Reiss, K., & Vermeer, H. (1996). Fundamentos para una teora funcional de la traduccin. Madrid:
Akal.
Rothe-Neves, R. (2002). Translation quality assessment for research purposes: An empirical
approach. Cuadernos de Traduo, 10, 113131.
SAE J2450. (2001). Surface vehicle recommended practice. SAE. The engineering society for advan-
cing mobility land sea air and space. USA.
Sans, A. (2004). Mtodos de investigacin de enfoque experimental. En R. Bisquerra (Ed.),
Metodologa de la investigacin educativa (pp. 167194). Madrid: Editorial la Muralla.
Schffner, C. (1998). From good to functionally appropriate: Assessing translation quality. In
Translation and quality (pp. 15). Clevedon: Multilingual Matters.
Simon, M. & Forgette-Giroux, R. (2001). A rubric for scoring postsecondary academic skills.
Practical Assessment, Research y Evaluation, 7, (18). Retrieved from: http://pareonline.net/
getvn.asp?v=7yn=18
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Varantola, K. (2002). Disposable corpora as intelligent tools in translation. Cuadernos de Traduo,
9, 171189.
Waddington, C. (2000). Estudio comparativo de diferentes mtodos de evaluacin de traduccin
general (ingls espaol). Madrid: Universidad Ponticia de Comillas.
Williams, M. (1989). The assessment of professional translation quality: Creating credibility out of
chaos. TTR: Traduction, Terminologie, Redaction, 2(2), 1333. doi:10.7202/037044ar
Williams, M. (2001). The application of argumentation theory to translation quality assessment.
Meta: Journal des traducteurs, 46(2), 327344. doi:10.7202/004605ar
Wimmer, S. (2011). El proceso de la traduccin especializada inversa: modelo, validacin emprica y
aplicacin didctica. (Doctoral Dissertation) University of Autnoma, Barcelona (Spain).
Retrieved from http://ddd.uab.cat/pub/tesis/2011/hdl_10803_42307/sw1de1.pdf
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Appendix 1
PERSPECTIVES
25
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

26

Appendix 2
R. M. MATEO ET AL.
PERSPECTIVES 27

Appendix 3
Table 1. Secondary corpus.
Subject EAC (DG EMPL (DG ENV (DG JLS (DG ENTR (DG TREN (DG
Education Employment, Environment) Justice, Enterprise and Transport
and Social Affairs Liberty and Industry) and Energy)
Culture) and Inclusion) Security)
Grades Text 1 Text 2 (pretest: Text 3, (pretest: Text 4 Text 5 (pretest: Text 6
obtained in (pretest: Good; posttest: Below average; (pretest: Acceptable; (pretest:
pretest and Very good; Below average) posttest: Very good; posttest: Very good;
posttest posttest: Unacceptable) posttest: Below posttest:
Good) Very good) average) Very good)
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Collection Third Third trimester of Third trimester of Third Third trimester Third
date trimester of 2009 2009 trimester of of 2009 trimester of
2009 2009 2009
Text size c. 500600 c. 500600 words c. 500600 words c. 500600 c. 500600 c. 500600
words words words words
28 R. M. MATEO ET AL.

Appendix 4

Table 1. Some examples of errors tagged with MAP Quantitative module.


Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

Appendix 4
PERSPECTIVES
29
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

30

Appendix 4
R. M. MATEO ET AL.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016

View publication stats


Appendix 4

Notes: Errors are marked in bold type.


PERSPECTIVES
31

S-ar putea să vă placă și