Documente Academic
Documente Profesional
Documente Cultură
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/301793106
CITATIONS READS
2 91
3 authors:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Cognitive and Neurological Bases for Terminology-enhanced Translation (CONTENT) View project
All content following this page was uploaded by Silvia Montero Martnez on 04 May 2016.
Roberto Martnez Mateo, Silvia Montero Martnez & Arsenio Jess Moya
Guijarro
To cite this article: Roberto Martnez Mateo, Silvia Montero Martnez & Arsenio Jess
Moya Guijarro (2016): The Modular Assessment Pack: a new approach to translation
quality assessment at the Directorate General for Translation, Perspectives, DOI:
10.1080/0907676X.2016.1167923
Download by: [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] Date: 04 May 2016, At: 04:57
PERSPECTIVES, 2016
http://dx.doi.org/10.1080/0907676X.2016.1167923
a
Department of Modern Philology, University of Castilla La Mancha, Cuenca, Spain; bDepartment of
Translation and Interpreting, University of Granada, Granada, Spain
1. Introduction
This article reviews two methodologies for Translation Quality Assessment (TQA) in
order to improve the weaknesses detected in the Quality Assessment Tool (QAT), a pro-
totype quantitative tool developed by the Directorate General for Translation (DGT). Dis-
cussions on how to determine the quality of a translation tend to be linked to relativity and
subjectivity. This is partly because of the blurred borders of the concept of quality itself
(Bowker, 2001, p. 347) and partly because of the necessary participation of a rater (the
human factor) in the assessment (House, 1997, p. 47). That is why, currently, in TQA,
(Colina, 2009) models that offer a global assessment from a macrotextual viewpoint
(top-down); and quantitative (Williams, 1989), analytic (Waddington, 2000) or practical
(Colina, 2009) models that offer a microtextual approach (bottom-up). The latter are more
widely used in professional settings.
With a view to building a new theoretical approach to TQA with a practical tool, we
carried out a critical examination of the most representative quantitative and qualitative
models.1 Quantitative tools (also known as metrics) include: the SICAL (Systme Cana-
dien dapprciation de la Qualit Linguistique) (Williams, 1989, p. 14); the LISA Quality
Assurance Model (LISA, 2007, p. 43); the SAE J2450 (SAE J2450, 2001, pp. 13); the
TAUS Dynamic Quality Evaluation (QE) Model, one of the latest and most signicant
contributions to TQA, developed by OBrien (2012) in collaboration with the Translation
Automation User Society (TAUS);2 and the QAT (EC, 2009, pp. 1112), a prototype tool
developed by the DGT of the European Commission to assess quality quantication of
external translations.
All these stand-alone tools apply quality control procedures (Parra, 2005) (SAE J2450
also allows for quality assurance). However, these TQA metrics present some common
weaknesses: they rely on rating scales that lack an explicit theoretical base (Colina,
2008, 2009; Jimnez-Crespo, 2011); they rely on the central concept of error as the dening
element of their assessment model and, subsequently, on related issues such as error type,
severity and error weightings, which sometimes present an unclear denition (Parra,
2005); they shape their denition of a quality translation as an error-free text or a text
with a number of errors (their allocated points) that does not surpass a predened limit
(acceptability threshold); they consider the notion of error as absolute, disregarding its
functional value (Martnez & Hurtado, 2001), and they identify and tag errors in isolation
rather than in relation to their context and function within the text (Nord, 1997). The line
that separates the error categories is sometimes so thin or blurred that different reviewers
might classify the same error into different categories, and the search for errors is limited
to the word and sentence level without taking into account the larger unit of the text or the
communicative context (Colina, 2008, 2009; Nord, 1997; Williams, 2001). The reviser
carries out a partial revision (Parra, 2007) of the selected sample, so the representativeness
of the limited, variable-length sample could be questioned (Gouadec, 1981; Larose, 1998);
these metrics do not specify what type of revision has to be made (i.e. unilingual, compara-
tive, etc.; Parra, 2005).
Despite these drawbacks, these quantifying systems ll a gap in the professional TQA
arena (Jimnez-Crespo, 2011), in which translation becomes a business with constraints
of time (De Rooze, 2006) and budget (OBrien, 2012). Nevertheless, metrics have the
following advantages: shared repetitive macro-error categories; a clear quality categor-
ization, with an acceptability threshold and different quality ranges; and the fact that
PERSPECTIVES 3
(ATA) rubric for grading (v. 2011).3 Both Colinas framework and the American Trans-
lation Association rubric adopt a textual and a functional approach to TQA as they analyze
the product of the translation (a text) taking the intended purpose of the translation as the
key criterion to determine its linguistic quality. They rely on a double-entry table that
relates dimensions (assessment criteria that correspond to smaller units of the quality con-
struct in translation), command levels and, at the intersection, level descriptors (Barber &
De Martn, 2009, p. 99). The success of this tool lies in the right choice and denition of
these three items.
The advantages of rubrics are: they provide a reference framework for the rater that
facilitates his decision-making based on limited, known and transparent criteria which
limits the subjective burden inherent to any assessment process with human intervention;
it is only necessary to allot positive or negative values to the descriptors to offer a summa-
tive valuation; they assess translation as a product in a given instance from a top-down
approach, which offers a general valuation of the text; and, as they are based on descriptive
propositions (descriptors), dimensions simplify the raters task and allow the rater to con-
centrate on the inadequacies from the medium range of the quality continuum (Jimnez-
Crespo, 2009, p. 76).
On the other hand, the following deciencies have been noticed: they have to ade-
quately describe the object they dene and descriptors have to convey the essence of
the feature they aim to assess; currently, there is no experimental verication of the exist-
ing models of translation competence, so the identication of the dimensions and
command levels has been based on those models which enjoy a long tradition and disse-
mination; and when the translator only receives the nal score, its capacity to offer mean-
ingful information about the overall quality may be diminished (Simon, 2001). To sum up,
while practical models have an extensive application, they also have limited transferability,
as they lack theoretical foundation and empirical validation (Colina, 2009, p. 237; Jimnez-
Crespo, 2011, p. 316). Also, these models rely on error quantication, a central issue in the
debate about academic and professional assessment (Jimnez-Crespo, 2011; Kussmaul,
1995). Meanwhile, theoretical models offer a global view, but they lack the required appli-
cability that professional translation demands. In this context, this paper introduces a new
tool for TQA, based on the Functional-Componential Approach (FCA) (Martnez-Mateo,
2014a). This proposal aims to remedy the deciencies found in the freelance translation
quality assessment tool devised at the Spanish Department of the DGT, by combining
both mainstream methodologies and embedding the theoretical underpinnings of the
Skopos theory. It also describes and discusses the preliminary results of a small-scale,
exploratory pilot study, carried out with the Modular Assessment Pack tool (MAP) (a
quantitative- and qualitative-based application). Finally, some observations on the
MAPs potential adjustments to other professional contexts are made.
4 R. M. MATEO ET AL.
Since Skopos theory (Reiss & Vermeer, 1996) establishes a link between a trans-
lations quality and its adequacy for purpose, it can be asserted that a quality translation
has to be functionally, pragmatically and textually adequate (Colina, 2009; Nord, 1997,
p. 137). Likewise, error denition and error typology in the FCA have to take into
account the relative and functional value of error within the situational context
(Nord, 1997, p. 73). Consequently, this approach integrates the traditional, quantitative
bottom-up approach with a new, qualitative top-down approach that allows for the con-
sideration of suprasegmental issues (Waddington, 2000, p. 394). While these two
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
elements start from opposite ends of textual analysis, by providing them with functional
quality criteria, they each become part of a qualitative-quantitative methodological con-
tinuum (Orozco, 2001, p. 102).
This functional and componential approach to TQA materializes in the MAP, a tool
comprised of two modules. The qualitative module is a four-dimensional assessment
rubric (see Section 3.3.1), in which the dimensions constitute the construct used by
the FCA to describe a good translation, relating it to the functional notion of adequacy.
Every dimension has several descriptors with associated points (see Appendix 1). The
metric module, with a calculator interface, includes an error typology with allotted
points. Therefore the MAP offers two quality indicators of the text, which provide a
comprehensive approach to translation from a macrotextual and microtextual
viewpoint.
(1) Using a TQA tool that combines a qualitative and quantitative approach allows for the
provision of a more comprehensive view of a quality translation.
(2) Using the MAP for the TQA of the DGTs outsourced translations improves inter-
and/or intra-rater reliability.
6 R. M. MATEO ET AL.
del Departamento de Lengua Espaola (the Spanish Departments Guide), organizing in-
house training and coordinating all quality-related issues (including analyzing the linguis-
tic quality of translations, both internally and externally). The extensive experience of these
two members, in addition to their wide range of functions and their valuable experience
with TQA, allow them to be considered as internal experts. These respondents took part
in an assessment at the end of 2013, in which they evaluated the secondary corpus with
the help of the new MAP tool. Moreover, they completed a questionnaire aimed at gather-
ing information on three topics: the respondents prole, the corpus and the MAP tool.
3.3. Tools
The posttest aims to empirically validate the MAP and its two modules. However, within
the framework of this study, the MAP also serves a second function: that of a collection
tool that gathers information via the questionnaire.
The principal contribution of the MAP to TQA is its functional-componential rubric. It
is based on the analysis and improvement of two textual and functional approaches:
Colinas (2008, 2009) model and the ATA rubric (v. 2011). The Funtional Componential
Approach embeds Nords functional view of translation quality into the four dimensions
in which the quality contruct is broken down and thus it is also integral part of the MAP.
and pragmatic criteria. In fact, there is no clear division between them; rather, they are
adjoining and overlapping realities (Montero-Martnez, Faber, & Buenda, 2011, p. 93).
The third dimension, Non-specialized lexical units and content adequacy, describes
the TTs conveyance of non-specialized lexical units and content in an adequate and
coherent manner. Therefore, the transferred knowledge corresponds to a basic categoriz-
ation of the world, and is verbalized through lexical units with non-specialized semantic
features (Montero-Martnez et al., 2011, p. 22). This adequate use of language includes
compliance with the language usage norms of the TT speaker community.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
The last dimension, Normative and stylistic adequacy, focuses on the observance of
grammar, spelling and punctuation rules in the TT and the use of an adequate style,
bearing in mind the aim and the target audience. As Malinowski (1923) states, texts
have to be understood in relation to their context of culture (genre), but also in relation
to their context of situation (register). Thus texts, in relation to the extratextual
(Gmez, 2006, p. 422) and intratextual situations, have to be clear, precise and concise.
Three factors are considered when studying the relationship between language and the
specic communicative situation in which it is used: eld, tone and mode (Martin,
1992). Field refers to the subject dealt with by the Directorates-General of the EC; tone
refers to the protagonists of communication, who, in this text type, are a writer expert
in the eld and a general reader, with varying degrees of knowledge; and mode refers to
the communication channel, which in this case is written.
The assessment rubric is a table in which the dimensions (columns) and the levels of
mastery (rows) intersect with the descriptors (cells) (Table 2). The descriptors dene the
concept alluded to by each dimension, drawing a quality continuum from the highest to
the lowest level of adequacy of the described feature. In addition, points are allotted to the
grades obtained in order for them to be operative (Mossop, 2007, p. 184). The thicker
horizontal line in Table 2 shows the acceptability threshold between the pass and fail cat-
egories. In the MAP, each descriptor is associated with a number of allotted points that
will be added up to arrive at the nal count. These points vary depending on the rank of
each dimension in the order of preference (Table 2).
integrate quantitative and qualitative aspects. Figure 1, from the left, links, one by one,
each dimension of the FCA to the two macroerror types (pragmatic and linguistic).
From the right, analogous links between the QATs eight error types and Nords functional
errors10 are established.
Third, the reviewers identied errors by comparing the source and the target texts
according to the requirements of the assignment, usage conditions and target-culture con-
ventions (Nord, 2009, p. 237). Thus, the reviewer will determine whether an error is lin-
guistic or pragmatic (functional); that is to say, he/she will assess the error according to its
impact on the function of the text, bearing in mind the communicative effect on the reader
(Kussmaul, 1995, p. 132).
Fourth, the QATs textual proles have been replaced with the DGTs textual binary
classication, which categorizes texts into two types (Quality Control 1 (QC1), Quality
Control 2 (QC2)) according to the quality control level they undergo. These control
levels depend on the aim and quality requirements of the TT. The reviewer simply has
to look up the text type in the lists found in the Spanish Department of the DGTs Revision
Manual.11
Fifth, as one of the greatest deciencies of quantitative models has to do with the varia-
bility in error tagging once detected, the meta-rules12 of the SAE J2450 model are incor-
porated in order to standardize error tagging. Furthermore, a preference order of errors
(Martnez-Mateo, 2014a, p. 256) is set up according to textual prole. This follows a
top-down hierarchy. Figures 2 and 3 show the order of preference that, when in doubt,
the reviewer will follow to tag errors. Errors are ordered from the highest to the lowest
with decreasing penalty values.
PERSPECTIVES 11
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
Thus, when in doubt about how to tag an error using the MAP qualitative module, the
reviewer will choose: pragmatic over linguistic; the error type (SENS, TERM, etc.) accord-
ing to the order of preference; and high over low.
3.3.3. Questionnaire
The questionnaire was designed to take into account the objectives and the empirical vari-
ables of this study, and is therefore structured into three content blocks.13 The rst block
(Sections IIII) focuses on the academic and professional prole of the respondents. The
second block (Section IV) looks into the theoretical underpinnings of the corpus regarding
its adequacy, representativeness and sample extraction. The third block (Sections VVI)
deals with the MAP. Specically, it poses questions regarding the qualitative module,
the relevance of the dimensions, the clarity of the descriptors and the suitability of the
scores. As for the quantitative module, the new text classication is assessed, as is the
appropriateness of the error typology and the associated weights. The last three questions
ask the respondent for an overall assessment of the MAP.
As far as the design is concerned, the funnel technique was used, i.e. the questionnaire
goes from the general to the particular. In order to allow for a wide range of answers,
open, dichotomous and polychotomous choice questions were employed. The language
used is clear and simple, in an attempt to create exclusive, unambiguous questions. The suit-
ability of the questionnaire was subsequently validated by an expert on Methods of Research
12 R. M. MATEO ET AL.
and Diagnostics in Education from the University of Castilla La Mancha (Spain) and by
three experts on Translation and Interpreting Quality Assessment from the University of
Granada (Spain) with the help of a validation guide (Martnez-Mateo, 2014a, pp. 392399).
Dossier. The folder contained instructions to guide the reviewer in the process, the corpus,
the MAP tool and the questionnaire.
3.4.1. Procedure
Efciency and efcacy governed all the decisions regarding the empirical development of
the test. Thus, the initial plan of holding an on-site training session with the two partici-
pants was discarded due to their time limitations. Instead, they were contacted via email.
Before commencing the study, they knew nothing about the test. Since these two members
of the QCG had not participated in the pretest, rst of all they had to be validated as suit-
able candidates for the posttest (see Section 3.4.2). Then came the actual test. It was
intended as an empirical validation of the conceptual and methodological improvements
of the QAT, which aimed to improve the MAP. To carry out the process, a two-week dead-
line was agreed with the participants. In a second round of emails, we sent the participants
a dossier that contained two Word les and three folders. One Word le described the
general framework of the study and included all the necessary information to complete
the test. The rst folder included the secondary corpus texts. The second folder comprised
the MAP tool, with its two modules, and instructions on how to use them. Here the theor-
etical approach, the functions available and the customization capacity of MAP were sum-
marized. As for the new qualitative module, a read through the assessment rubric and its
brief instructions was sufcient for them to learn to use it. The third folder was composed
of two subfolders: one for storing the assessment reports of the qualitative module and
another for the quantitative module reports. The last Word le was the questionnaire
that was to be completed as a conclusion.
The most important information contained in the dossier was the implementation of
the MAP, although the participants extensive knowledge of the QAT facilitated this
task. Nonetheless, a more detailed step-by-step practical explanation of the procedure
they were provided with follows.
Using the Track Changes feature in Word, reviewers had to mark the errors in the
selected sample. First, they evaluated the texts with the help of the MAP qualitative
module. For that purpose, they followed the order of preference of the dimensions in
the assessment rubric, from left to right. Next, they registered the mark obtained in
each dimension in the le Assessment summaries (see Appendix 1), the nal count of
which appears in the Final mark column. Then they evaluated the texts according to
the MAP quantitative module (Figure 4). First, they had to choose the text Prole
(QC1 or QC2) and select the page number (Pages). Next, they compared the original
text (OT) and TT to pinpoint errors and tag them according to their nature, type and
severity. To calculate the Final grade, the values of both modules are added together
and then divided by two (MAP Assessment summary in Appendix 1). That result is
PERSPECTIVES 13
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
recorded in the Recommendation (Appendix 1) table, and corresponds to the tools rec-
ommendation for that particular translation.
priori and experimental restrictions of this research conditioned its development and the
respondents perception.
From a gender viewpoint, the data collected show that, on average, female intra-rater
reliability was Acceptable (66.6%) and male intra-rater reliability was Below average (27.3%).
According to these data, overall intra-rater reliability can be regarded as being low, with
an agreement of 43% and a slightly higher divergence of 57%. Nonetheless, reliability is
inconclusive as these are mid-range values.
It is also worth noting that in the case of the Spanish Department, the QAT was
implemented following the standard guidelines because, due to the nature of their in-
house staff, it is one of the Linguistic Departments of the DGT that outsources the least
amount of translations. Hence, they used all the preset values and did not make any adjust-
ments. This fact aws the QATs value, for, as the developers of QAT stated, its greatest
virtue lies in its customization capacity.
Acceptable
Text 3, Rater A 50 45 47.5
Below average
Text 3, Rater B 50 52.5 51.25
Below average
Text 4, Rater A 90 94 92
Very good
Text 4, Rater B 90 94 92
Very good
Text 5, Rater A 50 50 50
Below average
Text 5, Rater B 50 42.5 46.25
Below average
Text 6, Rater A 75 61.6 68.3
Acceptable
Text 6, Rater B 75 61.6 68.3
Acceptable
Regrettably, there were no instances of the new error type, AD, so no conclusions can be
reached.
Regarding the results obtained with the qualitative module, reviewers agreed in all
cases. An example of this can be found in Table 7, which shows that, in texts 1 and 3,
reviewers have ticked the same descriptors for each dimension.
PERSPECTIVES 17
4.2.1.1. Intra-rater reliability. The intra-rater reliability of the posttest is analyzed by com-
paring the results obtained by each reviewer (A and B), using comparable tools (intra-rater
assessment): the QAT in the validation phase and the MAP quantitative module in the posttest.
In the validation phase, as Table 8 shows, scores obtained by reviewers A and B with the
help of the QAT and the MAP qualitative module agree in three texts and differ in the
other three (Martnez-Mateo, 2014a, p. 312).
These data show a low intra-rater reliability of both reviewers using the QAT and the
MAP quantitative module, as 50% is considered to be Below average.
4.2.1.2. Inter-rater reliability. The inter-rater reliability of reviewers A and B was analyzed
in four cases.
First, we compare the marks obtained by both reviewers when assessing the same text
with comparable tools. More precisely, these scores were obtained with the MAP quanti-
tative module and the results gathered with the QAT in the expert validation phase.
Table 9 shows that texts 1, 2, 4 and 5 (66.6%) had similar marks and texts 3 and 6 had
differing marks (Martnez-Mateo, 2014a, p. 324). All these data conrm that the quanti-
tative-based tools tested produce an acceptable inter-rater reliability (66.6%).
Table 9. Grades obtained with QAT and MAP quantitative modules (Inter-rater).
Experts validation QAT Posttest MAP quantitative module
Rater A Rater B Rater A Rater B
Text 1 79 70 86 79
Text 2 40 40 50 60
Text 3 30 37.5 45 52.5
Text 4 94 94 94 94
Text 5 40 40 50 42.5
Text 6 73.3 73.3 61.6 61.6
18 R. M. MATEO ET AL.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
However, the distribution of marks in a bar graph (Figure 6) does not show any regular
pattern. It is only worth mentioning that the lowest marks (text 3) present a greater graph
dispersion and that the highest marks (text 4) appear closer.
Second, when comparing the marks obtained by reviewers A and B using the MAP
qualitative module (Table 7), they agreed in all cases. Additionally, the reviewers ticked
the same descriptors for each dimension (Martnez-Mateo, 2014a, p. 325).
Third, the marks obtained using the MAP quantitative module agree in four of six texts
(66.6%; Table 9), which points to an acceptable inter-rater reliability.
Fourth, the nal marks given by reviewers A and B using the MAP agreed in ve out of
six texts (83.3%; Table 4). In other words, inter-rater reliability is between good and very
good; besides, in the remaining text, the difference is a mere 0.5 points (Martnez-Mateo,
2014a, pp. 325326).
4.3. Discussion
In the pretest, overall intra-rater reliability is low (39.81%), as evidenced by the two indi-
cators stemming from the evaluations with the Traditional methodology and the QAT.
The agreement rate is low (43%, Unacceptable) and the divergence rate is acceptable
PERSPECTIVES 19
(57%). Also, the QAT produced lower marks than the Traditional methodology. These
poor values are due to the fact that the QAT used default settings that penalize low
errors with three points and high errors with ten, disregarding any other variables. In
addition, the analysis of errors tagged with the qualitative tools demonstrates that the
reviewers identied different errors in the same translation and, even when they did
detect the same error, they sometimes tagged it differently, which brought about different
scores.
In response to the aforementioned deciencies, two of the material amendments
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
inserted in the MAP quantitative module were: the distinction between pragmatic and lin-
guistic errors and the setting of an order of errors per text type (applicable if in doubt), and
the application of a point-deduction scheme according to text type, nature and seriousness
of error.
Another conclusion that can be drawn is that, out of the 13 translations with similar
marks (Figure 5), 11 received Good and Very good marks, and only two received
mid-range marks bordering the acceptability threshold. This fact corroborates the initial
assumption that the marks in the mid-range of the quality continuum gave way to
varied opinions, whereas the Good and Very good marks generated higher levels of con-
sensus amongst raters (Martnez-Mateo, 2014a, p. 314).
Finally, and from a gender perspective, results show higher female intra-rater reliability
(66.6%); it would therefore have been desirable to have a larger number of female respon-
dents in order to explore the reasons for those discrepancies.
In the posttest, when comparing the results obtained by the reviewers with the QAT in
the validation phase and those obtained with the MAP quantitative module, they reveal
the low intra-rater reliability of the quantitative-based tools (which only reach a 50%
agreement per reviewer; Table 8). This poor percentage of agreement highlights the quan-
titative-based tools deciencies in TQA, since they only provide a partial, microtextual
view of quality, based only on the penalization of linguistic errors. In consequence, this
biased view should be complemented with a top-down approach.
The evaluation of intra-rater reliability has also been based on the results obtained with
the MAP as a whole, and with the qualitative and quantitative modules separately. The
total coincidence of the results obtained by raters A and B with the MAP qualitative
module, not only in the nal score but also in every one of the descriptors ticked per
dimension (Table 4), underlines the potential value of the TQA model presented in this
study. This module provides the reviewer with a reference framework that facilitates
decision making according to a limited, practical, known, transparent and customizable
set of criteria, restricting the subjectivity of those decisions and enhancing interobjectivity.
Marks obtained by raters A and B with the quantitative module coincide in four out of
six texts (Table 9), and disagreeing marks only vary by one point, which causes a change of
quality range. The rise in intra-rater reliability with the MAP quantitative module, in com-
parison to the QAT, indicates the positive effect of the improvements.
The comprehensive results of the reviewers of the posttest with the MAP offer high
intra-rater reliability (Table 4). This seems to prove that the inclusion of the qualitative
module in the MAP contributes to the provision of a more holistic view of the analyzed
text.
Finally, the opinions expressed by the reviewers in the questionnaire provided positive
and negative feedback, depending on the issue. Based on the analysis of the respondents
20 R. M. MATEO ET AL.
questionnaires, the training procedure used in this study seems to have some aws regard-
ing the full understanding of the MAPs functionality. This stresses the need for an in-
person training session to present in full the theoretical underpinnings and capacity of
the tool. This session could take between one and two hours for reviewers with previous
knowledge of TQA aid tools.
In addition, the usefulness of the new error type (AD), as well as the possibility of
inserting another error type in reference to internal coherence, will be evaluated, following
the recommendations provided by the respondents. The specication of an error typology
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
has long been a controversial issue, although there are several proposals, from the early
proposals of Kupsch-Losereit (1985), Gouadec (1989), Nord (1997) and Mossop (2007)
to recent ones by Jimnez-Crespo (2009, 2011) and OBrien (2012). Nevertheless, far
from being a settled issue, it still generates heated debate, which calls for more empirical
testing on the previous proposals.
5. Conclusions
Although we are aware of the shortcomings of this exploratory analysis, these do not
undermine the relevance of the ndings. Despite the limitations, the results obtained
allow for the corroboration of the rst hypothesis. The use of an assessment tool that com-
bines a qualitative with a quantitative approach, and is based on common quality criteria,
allows the reviewer to assess the translated text with a unied reference framework that
improves the interobjectivity and offers a more balanced view of the assessed text, as it
approaches the text from two necessary and complementary perspectives.
With regard to the second hypothesis that is, whether the MAP tool can be validated
as a reliable tool for the TQA of the Spanish Department of the DGTs outsourced trans-
lations the results of this pilot study are inconclusive. This has raised conceptual and
methodological considerations with regard to future improvements. The respondents
comments challenged some issues that need re-examination: the need to count on two sep-
arate dimensions to assess general and specialized language; the correspondence between
PERSPECTIVES 21
the qualitative module dimensions and the quantitative module errors; and the adjustment
of the weight of both modules.
Once improvements to the conceptual model and the MAP have been implemented, a
larger-scale study (with a larger corpus and a greater number of reviewers) should provide
stronger evidence of the conceptual and methodological validity of the tool. Only then
could the tool be adjusted to other linguistic combinations within the institutional
context of the DGT, or to other professional settings, given its excellent benetcost ratio.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
Notes
1. For an extensive review of the characteristics, weaknesses and strengths of these TQA models
and tools, see Martnez-Mateo (2014a, 2014b).
2. It provides a customizable modular TQA system for the selected content types and quality
criteria, which allows for adaptability to client preferences.
3. For more information, visit the website: http://www.atanet.org/certication/aboutexams_
rubic.pdf
4. The Guide for External Translators aims to provide external contractors with information
about the procedure and the technical and quality requirements that externalized translations
must fulll.
5. This simple revision (Parra, 2005, p. 362) is based on known criteria and fullls summative
(issues an assessment), formative (teaches freelance translator from its errors) and corrective
(makes amendments) functions (Martnez & Hurtado, 2001, p. 277).
6. There are 36 subjects dealt with by the EC. For further information, visit http://europa.eu/
pol/index_en.htm
7. Due to space restrictions and to ease the subsequent comparison of results, only the data
related to the corpus, the respondents and the pretest are presented here. A complete descrip-
tion of all the test elements can be found in Martnez-Mateo (2014a, pp. 284292).
8. Inter-Active Terminology for Europe (IATE) is the EUs inter-institutional terminology
database. It is publicly available at http://iate.europa.eu/SearchByQueryLoad.do;jsessionid=
DTGPV9jX0sdhVVGvN2X8bVPNlyHVGLT1GsDKDzPjHZCmyLVn0MxN!1492297265?
method=load
9. For the purposes of this study, cultural errors are included in pragmatic ones, since the
former are inadequacies related to world knowledge, and are not inferred from linguistic
signs or rules alone (linguistics errors) in a straightforward manner (Martnez-Mateo,
2014a, p. 248).
10. An in-depth description of the correspondences established can be found in Martnez-Mateo
(2014a, pp. 202223).
11. Available on the website: http://ec.europa.eu/translation/spanish/guidelines/documents/
revision_manual_es.pdf
12. (1) when in doubt, always choose the earliest primary category; and (2) when in doubt,
always choose serious over minor.
13. The questionnaire itself is available upon request to the authors via email.
14. General coincidence percentage = total records/total participants.
15. The freelance and rater translations in the tables and in Appendix 4 have been glossed in
English.
Acknowledgments
We would like to thank the willingness and cooperation of the members of the Spanish Department
of the Directorate-General for Translation of the European Commission. We would like to thank
the Editor and the anonymous referees for their useful comments on an earlier version of this
article. We also thank Maria Baldarelli for the language editing of the text.
22 R. M. MATEO ET AL.
Disclosure statement
No potential conict of interest was reported by the authors.
Funding
This research is part of the project Cognitive and Neurological Bases for Terminology-enhanced
Translation (CONTENT) (FFI201452740-P), funded by the Spanish Ministry of Economy and
Competitivity.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
Notes on contributors
Roberto Martnez Mateo holds a degrees in English Philology from the University of Val-
ladolid and another degree in Translation and Interpreting from the University of the
Basque Country. He gained his PhD in Translation and teaches English as a foreign
language, didactics and translation at the University of Castile La Mancha. His research
interests deal with translation quality assessment, communicative language skills and
translation as a teaching tool for Foreign Language Teaching (FLT). He has publications
in Journal of Language Teaching and research, Miscelanea: A Journal of English and Amer-
ican Studies and Ocnos.
Silvia Montero Martnez holds a degree in English Language and Literature and a M.A. in
Specialized Translation from the University of Valladolid. She has a PhD in Spanish Lin-
guistics. She lectures on Translation, Terminology and Translation Technologies at the
University of Granada. Her main research interests are terminology, specialized trans-
lation and knowledge engineering. She is the author of various books and chapters on
lexical semantics, translation and terminology. Her work has been published in several
international peer-reviewed journals, such as Terminology, Perspectives, META, Babel,
and Journal of Pragmatics.
A. Jess Moya Guijarro does research in Systemic Functional Linguistics and has pub-
lished several articles on information, thematicity and picture books in international jour-
nals such as Word, Text, Functions of Language, Journal of Pragmatics, Text and Talk,
Review of Cognitive Linguistics and Atlantis. He is co-editor of The World Told and The
World Shown: Multisemiotic Issues (2009, Palgrave Macmillan). He is also author of the
book, A Multimodal Analysis of Picture Books for Children. A Systemic Functional
Approach (2014, Equinox).
ORCID
Roberto Martnez Mateo http://orcid.org/0000-0001-7110-8789
References
Barber, E., & Martn, E. (2009). Portfolio electrnico: aprender a evaluar el aprendizaje. Barcelona:
Editorial UOC.
Bowker, L. (2001). Towards a methodology for exploiting specialized target Language corpora as trans-
lation resources. International Journal of Corpus Linguistics, 5(1), 1752. doi:10.1075/ijcl.5.1.03bow
PERSPECTIVES 23
Bowker, L., & Pearson, J. (2002). Working with specialized language. A practical guide to using
corpora. London: Routledge.
Colina, S. (2008). Translation quality evaluation: Empirical evidence for a functionalist approach.
The Translator, 14(1), 97134. doi:10.1080/13556509.2008.10799251
Colina, S. (2009). Further evidence for a functionalist approach to translation quality evaluation.
Target, 21(2), 235264. doi:10.1075/target.21.2.02col
Corpas, G. (2001). Compilacin de un corpus ad hoc para la enseanza de la traduccin inversa
especializada. TRANS. Revista de traductologa, 5, 155184. Retrieved from http://www.trans.
uma.es/Trans_5/t5_155184_GCorpas.pdf
De Rooze, B. (2006). La traduccin, contra reloj. Consecuencias de la presin por falta de tiempo en el
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
Martin, J. R. (1992). English text. System and structure. Amsterdam: John Benjamins.
Martnez, N., & Hurtado, A. (2001). Assessment in translation studies: Research needs. Meta:
Journal des traducteurs, 46(2), 272287. doi:10.7202/003624ar
Martnez-Mateo, R. (2014a). Propuesta de evaluacin de la calidad en la DGT de la Comisin
Europea: el modelo Funcional-Componencial y las traducciones externas ingls-espaol.
(Doctoral Dissertation) University of Castilla La Mancha, Spain. Retrieved from https://
ruidera.uclm.es/xmlui/handle/10578/4120
Martnez-Mateo, R. (2014b). A deeper look into metrics for Translation Quality Assessment
(TQA): A case study. Miscelanea: A Journal of English and American Studies, 49, 7393.
Montero-Martnez, S., Faber, P., & Buenda, M. (2011). Terminologa para traductores e intrpretes:
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
Appendix 1
PERSPECTIVES
25
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
26
Appendix 2
R. M. MATEO ET AL.
PERSPECTIVES 27
Appendix 3
Table 1. Secondary corpus.
Subject EAC (DG EMPL (DG ENV (DG JLS (DG ENTR (DG TREN (DG
Education Employment, Environment) Justice, Enterprise and Transport
and Social Affairs Liberty and Industry) and Energy)
Culture) and Inclusion) Security)
Grades Text 1 Text 2 (pretest: Text 3, (pretest: Text 4 Text 5 (pretest: Text 6
obtained in (pretest: Good; posttest: Below average; (pretest: Acceptable; (pretest:
pretest and Very good; Below average) posttest: Very good; posttest: Very good;
posttest posttest: Unacceptable) posttest: Below posttest:
Good) Very good) average) Very good)
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
Collection Third Third trimester of Third trimester of Third Third trimester Third
date trimester of 2009 2009 trimester of of 2009 trimester of
2009 2009 2009
Text size c. 500600 c. 500600 words c. 500600 words c. 500600 c. 500600 c. 500600
words words words words
28 R. M. MATEO ET AL.
Appendix 4
Appendix 4
PERSPECTIVES
29
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016
30
Appendix 4
R. M. MATEO ET AL.
Downloaded by [UGR-BTCA Gral Universitaria], [Silvia Montero Martnez] at 04:57 04 May 2016