VOICE Mark-Up Conventions v2-1 PDF

TRANSCRIPTION CONVENTIONS
[2.1]
Mark-up conventions
The VOICE Transcription Conventions are protected by copyright.

Duplication or distribution to any third party of all or any part of the
material is not permitted, except that material may be duplicated by you
for your personal research use in electronic or print form. Permission for
any other use must be obtained from VOICE. Authorship must be
acknowledged in all cases.
Mark-up conventions
Version 2.1 June 2007
1. SPEAKER IDS
S1: Speakers are generally numbered in the order
S2: they first speak. The speaker ID is given at the
… beginning of each turn.
SS: Utterances assigned to more than one speaker
(e.g. an audience), spoken either in unison or
staggered, are marked with a collective speaker
ID SS.
SX: Utterances that cannot be assigned to a
particular speaker are marked SX.
SX-f: Utterances that cannot be assigned to a
SX-m: particular speaker, but where the gender can be
identified, are marked SX-f or SX-m.
SX-1: If it is likely but not certain that a particular
SX-2: speaker produced the utterance in question, this
… is marked SX-1, SX-2, etc.
2. INTONATION
Example: Words spoken with rising intonation are
S1: that’s what my next er slide? does followed by a question mark “?” .
Example: Words spoken with falling intonation are
S7: that’s point two. absolutely yes. followed by a full stop “.” .
3. EMPHASIS
Example: If a speaker gives a syllable, word or phrase
S7: er internationalization is a very particular prominence, this is written in capital
IMPORTANT issue letters.
Example:
S3: toMORrow we have to work on the
presentation already
4. PAUSES
Example: Every brief pause in speech (up to a good half
SX-f: because they all give me different (.) second) is marked with a full stop in
different (.) points of view parentheses.
Example: Longer pauses are timed to the nearest second
S1: aha (2) so finally arrival on monday and marked with the number of seconds in
evening is still valid parentheses, e.g. (1) = 1 second, (3) = 3 seconds.
2
© VOICE
5. OVERLAPS
Example: Whenever two or more utterances happen at the
S1: it is your best <1> case </1> scenario (.) same time, the overlaps are marked with
S2: <1> yeah </1> numbered tags: <1> </1>, <2> </2>,…
S1: okay Everything that is simultaneous gets the same
number. All overlaps are marked in blue.
Example: All overlaps are approximate and words may be
S9: it it is (.) to identify some<1>thing </1> split up if appropriate. In this case, the tag is
where (.) placed within the split-up word.
S3: <1> mhm </1>
6. OTHER-CONTINUATION
Example: Whenever a speaker continues, completes or
S1: what up till (.) till twelve? supports another speaker’s turn immediately (i.e.
S2: yes= without a pause), this is marked by “=”.
S1: =really. so it’s it’s quite a lot of time.
7. LENGTHENING
Example: Lengthened sounds are marked with a colon “:”.
S1: you can run faster but they have much
mo:re technique with the ball
Example: Exceptionally long sounds (i.e. approximating 2
S5: personally that’s my opinion the: er::m seconds or more) are marked with a double
colon “::”.
8. REPETITION
Example: All repetitions of words and phrases (including
S11: e:r i’d like to go t- t- to to this type of self-interruptions and false starts) are
course transcribed.
9. WORD FRAGMENTS
Example: With word fragments, a hyphen marks where a
S6: with a minimum of (.) of participa- part of the word is missing.
S1: mhm
S6: -pation from french universities to say we
have er (.) a joint doctorate or a joi- joint
master
10. LAUGHTER
Example: All laughter and laughter-like sounds are
S1: in denmark well who knows. @@ transcribed with the @ symbol, approximating
S2: <@> yeah </@> @@ that’s right syllable number (e.g. ha ha ha = @@@).
Utterances spoken laughingly are put between
<@> </@> tags.
3
© VOICE
11. UNCERTAIN TRANSCRIPTION
Example: Word fragments, words or phrases which cannot
S3: i’ve a lot of very (generous) friends be reliably identified are put in parentheses ( ).
Example:
SX-4: they will do whatever they want
because they are a compan(ies)
12. PRONUNCIATION VARIATIONS &

COINAGES
Example: Striking variations on the levels of phonology,
S4: i also: (.) e:r played (.) tennis e:r <pvc> morphology and lexis as well as ‘invented’
bices </pvc> e:r we rent? went? words are marked <pvc> </pvc>.
Example: What you hear is represented in spelling
S9: how you were controlling such a thing according to general principles of English
and how you <pvc> (avrivate) </pvc> (it) orthography. Uncertain transcription is put in
parentheses ( ) .
Example: If a corresponding existing word can be
S6: what we try to explain here is the foreign identified, this existing word is added between
direct investment growth (2) in a certain curly brackets { }.
industry (.) and a certain <pvc> compy
{company} </pvc>
Example: Particularly when it comes to salient variations
S2: anyway i make you an a total (.) <pvc> on the level of phonology, e.g. sound
summamary {summary} <ipa> sʌməˈmærɪ substitution or addition, a phonetic
</ipa> </pvc> of destinations representation should be added between <ipa>
</ipa> tags.
13. ONOMATOPOEIC NOISES
Example: When speakers produce noises in order to
S1: it may be quite HARMLESS and at the imitate something instead of using words, these
end of the day you (.) <ono> dəʃ dəʃ dəʃ onomatopoeic noises are rendered in IPA
</ono> (.) somebody symbols between <ono> </ono> tags.
14. NON-ENGLISH SPEECH

Example: Utterances in a participant’s first language (L1)
S5: <L1de> bei firmen </L1de> or wherever are put between tags indicating the speaker’s L1.
Example: Utterances in languages which are neither
S7: er this is <LNde> die seite? (welche) English nor the speaker’s first language are
</LNde> is marked LN with the language indicated.
Example: Non-English utterances where it cannot be
S4: it depends in in in <LQit> roma </LQit> ascertained whether the language is the
speaker’s first language or a foreign language
are marked LQ with the language indicated.
Example: Unintelligible utterances in a participant’s L1,
S2: erm we want to go t- to <LNvi> xx xxx LN or in an LQ are represented by x’s
</LNvi> island first of all approximating syllable number.
Example: Utterances in a language one cannot recognize
S4: and now we do the boat trip (1) <L1xx> are marked L1xx, LNxx or LQxx.
xxxxx </L1xx>
S3: mhm
4
© VOICE
Example: If possible, translations into English are
S3: <L1fr> oui un grand carre {yes like a big provided between curly brackets { } immediately
square} </L1fr> (.) i <fast> think it would after the non-English speech.
</fast> be better if we put the tables a <soft>
different way </soft>
15. SPELLING OUT

Example: The <spel> </spel> tag is used to mark words or
S1: and they (3) created some (1) some er (2) abbreviations which are spelled out by the
JARGON. do you know? the word JARGON? speaker, i.e. words whose constituents are
(.) <spel> j a r- </spel> <spel> j a r g o n? pronounced as individual letters.
</spel> jargon
16. SPEAKING MODES

Example: Utterances which are spoken in a particular
S2: because as i explained before is that we mode (fast, soft, whispered, read, etc.) and are
have in the <fast> universities of cyprus we notably different from the speaker’s normal
have </fast> a specific e:rm procedure speaking style are marked accordingly.
<fast> </fast> The list of speaking modes is an open one.

<slow> </slow>
<loud> </loud>
<soft> </soft>
<whispering> </whispering>
<sighing> </sighing>
<reading> </reading>
<reading aloud> </reading aloud>
<on phone> </on phone>
<imitating> </imitating>
<singing> </singing>
<yawning> </yawning>
17. BREATH
Example: Noticeable breathing in or out is represented by
S1: so it’s always hh (.) going around (2) two or three h’s (hh = relatively short; hhh=
yeah relatively long).
18. SPEAKER NOISES

<coughs> Noises produced by the current speaker are
<clears throat> always transcribed. Noises produced by other
<sniffs> speakers are only transcribed if they seem
<sneezes> relevant (e.g. because they make speech
<snorts> unintelligible or influence the interaction).
<applauds> The list of speaker noises is an open one.
<smacks lips>
<yawns>
<whistles>
<swallows>
Example: These noises are transcribed as part of the
S1: yeah <1> what </1> i think in in doctor running text and put between pointed brackets
levels < >.
5
© VOICE
S7: <1> <clears throat> </1>
Example: If it is deemed important to indicate the length
SX-m: but you NEVER KNOW when it’s of the noise (e.g. if a coughing fit disrupts the
popping up you never kno:w interaction), this is done by adding the number
S3: <coughs (6)> of seconds in parentheses after the descriptor.
19. NON-VERBAL FEEDBACK

<nods> Whenever information about it is available, non-
<shakes head> verbal feedback is transcribed as part of the
running text and put between pointed brackets
< >.
Example: If it is deemed important to indicate the length
S3: but i think if you structure corporate of the non-verbal feedback, this is done by
governance appropriately you can have adding the number of seconds in parentheses.
everything (1)
S7: <soft> mhm </soft> <nods (2)>
20. ANONYMIZATION
A guiding principle of VOICE is sensitivity to
the appropriate extent of anonymization.
As a general rule, names of people, companies,
organizations, institutions, locations, etc. are
replaced by aliases and these aliases are put into
square brackets [ ]. The aliases are numbered
consecutively, starting with 1.
Whenever speakers who are involved in the
interaction are addressed or referred to, their
names are replaced by their respective speaker
IDs.
Example:
S9: that's one of the things (.) that i (1) just A speaker’s first name is represented by the
wanted to clear out. (2) [S13]? plain speaker ID in square brackets [S1], etc.
Example:
S6: so: (1) ei:ther MYself or mister [S2/last] A speaker’s last name is marked [S1/last], etc.
or even boss (.) should be there every year
Example: If a speaker’s full name is pronounced, the two

S8: so my name is [S8] [S8/last] from vienna tags are combined to [S1] [S1/last], etc.
Example: Names of people who are not part of the
S2: that division is headed by (1) [first ongoing interaction are substituted by [first
name3] [last name3] (1) name1], etc. or [last name1], etc. or a
combination of both.
Example: Companies and other organizations need to be
S5: erm she is currently head of marketing anonymized as well. Their names are replaced
(and) with the [org2] (1) by [org1], etc.
Example: Names of places, cities, countries, etc. are
S1: i: i really don’t wanna have a: a joint anonymized when this is deemed relevant in
degree e:r with the university of [place12] (.) order to protect the speakers’ identities and their
environment. They are replaced by [place1], etc.
6
© VOICE
Example: Other names or descriptors may be anonymized
S8: he get the <L1cs> diplom {diploma} by [name1], etc., as in e.g. Charles University.
</L1cs> of [name1] university (.) and french
university can give him also the <L1cs>
diplom {diploma} </L1cs>
Example: Products or other objects may be anonymized by

S3: erm i- in the [thing1] is very well [thing1], etc.
explained. so <2> i can </2> pa- <3> er pass
you this </3> th- the definitions.
S4: <2> aha </2>
S4: <3> okay <@> okay </@> </3>
21. CONTEXTUAL EVENTS

{mobile rings} Contextual information is added between curly
{S7 enters room} brackets { } only if it is relevant to the
{S2 points at S5} understanding of the interaction or to the
{S4 starts writing on blackboard} interaction as such. If it is deemed important to
{S4 stops writing on blackboard} indicate the length of the event, this can be done
{S2 gets up and walks to blackboard (7)} by adding the number of seconds in parentheses.
{S3 pours coffee (3)}
{SS reading quietly (30)}
…
Example:
S3: one dollar you get (.) (at) one euro you get
one dollar twenty-seven. (.)
S4: right. {S5 gets up to pour some drinks}
S3: right now at this time (3) Explanation:
S1: er page five is the er (4) {S5 places some The pause in the conversation occurs because of
cups and glasses on the desk (4)} the contextual event.
S1: i think is the descritip- e:r part of what i
have just explained (.)
22. PARALLEL CONVERSATIONS

Example: To indicate that a speaker is addressing not the
S1: four billion <spel> u s </spel> dollars. (.) whole group but one speaker in particular, the
S4: quite impressive (.) stretch of speech is marked with (e.g.) <to S1>
S1: er <to S2> not quite isn’t it </to S2> (.) i </to S1>, choosing the speaker ID of the
understand some other countries we handle addressee.
Example: Wherever two or more conversational threads
S7: i’ve i’ve found the people very stressed emerge which are too difficult to transcribe, as a
SS: @@@ general rule only the main thread of
S7: that's (.) i don’t know how many of you conversation is transcribed. The threads which
study here but it’s VERY important to push are not transcribed are treated like a contextual
the close the door button in that elevator. this event and indicated between curly brackets { }.
is something i’ve never <3> seen in sweden
</3> {parallel conversation between S1 and
S3 starts} or anywhere else <4> but it’s very
7
© VOICE
important to push this button </4>
SS: <3> @@@@ </3>
SS: <4> @@@@@@@@ </4> @@
S7: <5> i never even saw this button in
another el- elevator </5>
SS: <5> @@@@@@@@@@ </5>
{parallel conversation between S1 and S3
ends} @@@
23. UNINTELLIGIBLE SPEECH
Example: Unintelligible speech is represented by x’s
S4: we <un> xxx </un> for the <7> supreme approximating syllable number and placed
(.) three </7> possibilities between <un> </un> tags.
S1: <7> next yeah </7>
Example: If it is possible to make out some of the sounds
S7: obviously the the PROCESS will <un> x uttered, a phonetic transcription of the x’s is
<ipa> θeɪŋ </ipa> </un> (.) w- w- will (.) added between <ipa> </ipa> tags.
will take (.) at least de- decade
24. TRANSCRIPTION BORDERS

<beg CD1_4_00:35> The beginning of the transcript is noted by
indicating the CD number, the track number and
the exact position of the respective track in
minutes and seconds.
<end CD1_21_01:27> The end of the transcript is noted in the same
way.
<end CD1_19_01:27> A gap in the transcription is indicated in
(gap 00:06:36) {multiple parallel parentheses, including its length in hh:mm:ss.
conversations, hardly intelligible} Curly brackets { } are used in order to specify
<beg CD1_21_02:03> the reasons for or the circumstances of the gap.
<end CD1_24_3:02> An interruption in the recording is indicated in
(nrec 00:00:45) {change of minidisk} the same way, but abbreviated as “nrec” (i.e.
<beg CD2_1_00:00> non-recorded). The length you indicate will
normally be a guess.
In addition to the regular mark-up, transcribers supplement the transcripts with Transcriber’s
Notes in which they provide additional contextual information and observations about other
features of the interaction not accounted for in the transcript.
For a detailed discussion of specific aspects of the transcription conventions cf. Breiteneder,
Pitzl, Majewski, Klimpfinger. (2006). "VOICE recording – Methodological challenges in the
compilation of a corpus of spoken ELF". Nordic Journal of English Studies, 5/2, 161-188.
8
© VOICE

VOICE Mark-Up Conventions v2-1 PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

VOICE Mark-Up Conventions v2-1 PDF

Încărcat de

Drepturi de autor:

Formate disponibile

TRANSCRIPTION CONVENTIONS

The VOICE Transcription Conventions are protected by copyright.

12. PRONUNCIATION VARIATIONS &

14. NON-ENGLISH SPEECH

15. SPELLING OUT

16. SPEAKING MODES

<fast> </fast> The list of speaking modes is an open one.

18. SPEAKER NOISES

19. NON-VERBAL FEEDBACK

Example: If a speaker’s full name is pronounced, the two

Example: Products or other objects may be anonymized by

21. CONTEXTUAL EVENTS

22. PARALLEL CONVERSATIONS

24. TRANSCRIPTION BORDERS

S-ar putea să vă placă și