Documente Academic
Documente Profesional
Documente Cultură
[2.1]
Mark-up conventions
1. SPEAKER IDS
S1: Speakers are generally numbered in the order
S2: they first speak. The speaker ID is given at the
… beginning of each turn.
SS: Utterances assigned to more than one speaker
(e.g. an audience), spoken either in unison or
staggered, are marked with a collective speaker
ID SS.
SX: Utterances that cannot be assigned to a
particular speaker are marked SX.
SX-f: Utterances that cannot be assigned to a
SX-m: particular speaker, but where the gender can be
identified, are marked SX-f or SX-m.
SX-1: If it is likely but not certain that a particular
SX-2: speaker produced the utterance in question, this
… is marked SX-1, SX-2, etc.
2. INTONATION
Example: Words spoken with rising intonation are
S1: that’s what my next er slide? does followed by a question mark “?” .
Example: Words spoken with falling intonation are
S7: that’s point two. absolutely yes. followed by a full stop “.” .
3. EMPHASIS
Example: If a speaker gives a syllable, word or phrase
S7: er internationalization is a very particular prominence, this is written in capital
IMPORTANT issue letters.
Example:
S3: toMORrow we have to work on the
presentation already
4. PAUSES
Example: Every brief pause in speech (up to a good half
SX-f: because they all give me different (.) second) is marked with a full stop in
different (.) points of view parentheses.
Example: Longer pauses are timed to the nearest second
S1: aha (2) so finally arrival on monday and marked with the number of seconds in
evening is still valid parentheses, e.g. (1) = 1 second, (3) = 3 seconds.
2
© VOICE
5. OVERLAPS
Example: Whenever two or more utterances happen at the
S1: it is your best <1> case </1> scenario (.) same time, the overlaps are marked with
S2: <1> yeah </1> numbered tags: <1> </1>, <2> </2>,…
S1: okay Everything that is simultaneous gets the same
number. All overlaps are marked in blue.
Example: All overlaps are approximate and words may be
S9: it it is (.) to identify some<1>thing </1> split up if appropriate. In this case, the tag is
where (.) placed within the split-up word.
S3: <1> mhm </1>
6. OTHER-CONTINUATION
Example: Whenever a speaker continues, completes or
S1: what up till (.) till twelve? supports another speaker’s turn immediately (i.e.
S2: yes= without a pause), this is marked by “=”.
S1: =really. so it’s it’s quite a lot of time.
7. LENGTHENING
Example: Lengthened sounds are marked with a colon “:”.
S1: you can run faster but they have much
mo:re technique with the ball
Example: Exceptionally long sounds (i.e. approximating 2
S5: personally that’s my opinion the: er::m seconds or more) are marked with a double
colon “::”.
8. REPETITION
Example: All repetitions of words and phrases (including
S11: e:r i’d like to go t- t- to to this type of self-interruptions and false starts) are
course transcribed.
9. WORD FRAGMENTS
Example: With word fragments, a hyphen marks where a
S6: with a minimum of (.) of participa- part of the word is missing.
S1: mhm
S6: -pation from french universities to say we
have er (.) a joint doctorate or a joi- joint
master
10. LAUGHTER
Example: All laughter and laughter-like sounds are
S1: in denmark well who knows. @@ transcribed with the @ symbol, approximating
S2: <@> yeah </@> @@ that’s right syllable number (e.g. ha ha ha = @@@).
Utterances spoken laughingly are put between
<@> </@> tags.
3
© VOICE
11. UNCERTAIN TRANSCRIPTION
Example: Word fragments, words or phrases which cannot
S3: i’ve a lot of very (generous) friends be reliably identified are put in parentheses ( ).
Example:
SX-4: they will do whatever they want
because they are a compan(ies)
17. BREATH
Example: Noticeable breathing in or out is represented by
S1: so it’s always hh (.) going around (2) two or three h’s (hh = relatively short; hhh=
yeah relatively long).
5
© VOICE
S7: <1> <clears throat> </1>
Example: If it is deemed important to indicate the length
SX-m: but you NEVER KNOW when it’s of the noise (e.g. if a coughing fit disrupts the
popping up you never kno:w interaction), this is done by adding the number
S3: <coughs (6)> of seconds in parentheses after the descriptor.
20. ANONYMIZATION
A guiding principle of VOICE is sensitivity to
the appropriate extent of anonymization.
As a general rule, names of people, companies,
organizations, institutions, locations, etc. are
replaced by aliases and these aliases are put into
square brackets [ ]. The aliases are numbered
consecutively, starting with 1.
Whenever speakers who are involved in the
interaction are addressed or referred to, their
names are replaced by their respective speaker
IDs.
Example:
S9: that's one of the things (.) that i (1) just A speaker’s first name is represented by the
wanted to clear out. (2) [S13]? plain speaker ID in square brackets [S1], etc.
Example:
S6: so: (1) ei:ther MYself or mister [S2/last] A speaker’s last name is marked [S1/last], etc.
or even boss (.) should be there every year
6
© VOICE
Example: Other names or descriptors may be anonymized
S8: he get the <L1cs> diplom {diploma} by [name1], etc., as in e.g. Charles University.
</L1cs> of [name1] university (.) and french
university can give him also the <L1cs>
diplom {diploma} </L1cs>
Example:
S3: one dollar you get (.) (at) one euro you get
one dollar twenty-seven. (.)
S4: right. {S5 gets up to pour some drinks}
S3: right now at this time (3) Explanation:
S1: er page five is the er (4) {S5 places some The pause in the conversation occurs because of
cups and glasses on the desk (4)} the contextual event.
S1: i think is the descritip- e:r part of what i
have just explained (.)
7
© VOICE
important to push this button </4>
SS: <3> @@@@ </3>
SS: <4> @@@@@@@@ </4> @@
S7: <5> i never even saw this button in
another el- elevator </5>
SS: <5> @@@@@@@@@@ </5>
{parallel conversation between S1 and S3
ends} @@@
23. UNINTELLIGIBLE SPEECH
Example: Unintelligible speech is represented by x’s
S4: we <un> xxx </un> for the <7> supreme approximating syllable number and placed
(.) three </7> possibilities between <un> </un> tags.
S1: <7> next yeah </7>
Example: If it is possible to make out some of the sounds
S7: obviously the the PROCESS will <un> x uttered, a phonetic transcription of the x’s is
<ipa> θeɪŋ </ipa> </un> (.) w- w- will (.) added between <ipa> </ipa> tags.
will take (.) at least de- decade
In addition to the regular mark-up, transcribers supplement the transcripts with Transcriber’s
Notes in which they provide additional contextual information and observations about other
features of the interaction not accounted for in the transcript.
For a detailed discussion of specific aspects of the transcription conventions cf. Breiteneder,
Pitzl, Majewski, Klimpfinger. (2006). "VOICE recording – Methodological challenges in the
compilation of a corpus of spoken ELF". Nordic Journal of English Studies, 5/2, 161-188.
8
© VOICE