Sunteți pe pagina 1din 14

Beginning a monologue: The opening sequence of video blogs

Maximiliane Frobenius *
Saarland University, Geb. C5 3, Campus Saarbrcken, D-66123 Saarbrcken, Germany
1. Introduction
This paper describes the opening sequences of video blogs (vlogs). It presents the various ways people make use of
conventions from other genres, how they make up and try to establish their own conventions, and how this facilitates the
production of a vlog, especially the opening sequence. Research on openings and research on monologues is applied to vlog
data, integrating the two strands.
Schegloff (1968), in his article Sequencing in Conversational Openings, focussed on the allocation of speaker roles (a and
b, or rst speaker and second speaker) between two parties. Assuming that conversation between two or more participants is
the default, most frequent pattern, then monologues, which lack an opening phase constructed around fairly quick
exchanges of turns, must pose a challenge. Garrod and Pickering (2004) claim that dialogue is easier to produce than
monologue because of an interactive processing mechanism that leads to the alignment of linguistic representations
between partners. However, in the production of a monologue, there is no second speaker to negotiate these roles with. Thus,
single speakers must develop compensatory strategies that make up for the missing co-construction which entails a lack of
turn taking and negotiation of speaker roles. These strategies are a rich area of study whose exploration will help illuminate
generally how speakers adapt to their speech situation, not only in the context of monologues, but also in the context of
newly developing media more generally. The current study gives an overview of the different strategies vloggers employ,
proposing a taxonomy of different practices in opening sequences. As vlogs make up a fairly young genre which is still
developing, this categorization cannot encompass the complete range of openings.
Thepresent investigationapplies previous ndings fromresearchonmonologues andresearchonopenings tothenovel area
of vlogs. Thestudylays out thefeatures of talkthat is bereft of salient dialogic elements, i.e. phaticcommunionandsequentality,
which cannot occur due to the missing interlocutor. I demonstrate how the transition fromsilence to talk is managed instead
Journal of Pragmatics 43 (2011) 814827
Article history:
Received 12 November 2009
Received in revised form 17 September 2010
Accepted 19 September 2010
Computer-mediated communication
Video blog
This study investigates the beginning sequences of video blogs, a relatively new form of
computer-mediated communication. It analyzes spoken language with regards to the
three salient factors that shape the situation the passages under analysis occur in: (1) it is
monologic language, (2) the passages are opening sequences, (3) the passages are taken
from a CMC context. As a result, the paper provides a taxonomy of practices commonly
used in this setting. Furthermore, I demonstrate that speakers develop and borrow
strategies to compensate for the missing interlocutor. Openings in video blogs do not
necessarily have the same functions as conversational openings in other settings. They
represent an interactional element to encourage viewers to respond via the interactive
feature embedded in the website, and they work toward identity construction.
2010 Elsevier B.V. All rights reserved.
* Tel.: +49 681 3022214; fax: +49 681 3023670.
E-mail address:
Contents lists available at ScienceDirect
Journal of Pragmatics
j our nal homepage: www. el sevi er . com/ l ocat e/ pr agma
0378-2166/$ see front matter 2010 Elsevier B.V. All rights reserved.
usingstrategies borrowedfromother monologic genres, while taking intoconsiderationthe technical options of the vloggenre.
These options render the data unique in the study of computer-mediated communication (CMC) or mediated discourse in
general, since two phases of production of unscripted, spoken material need to be distinguished, as I showin the next section.
Everything that is usually negotiated in openings (availability for interaction, identication, social status, alignment of
participants) is irrelevant for vlog openings, since (some of) these things happenvia different channels. So if openingsequences
in vlogs are not actually needed, one must ask why some speakers have an introductory phase. As early as 1956, Horton and
Wohl coined the term para-social interaction to describe practices used in TV presenters talk, such as direct address of the
audience. Tolson(2005) states about greetings directedat the audienceinmonologicmediatalkthat thetalkconstructs a place
for potential interaction, whether or not it is takenupinpractice (10). Hecontinues: It might simplybeawayof reachingout to
the active listener, provoking a basic form of active listenership. Transferred to vlog openings, these statements assign
interactive elements of the vlog text the function of persuading the viewers to make use of the various features of the website
that allow them to reply: writing comments, rating the video, sending a personal message or posting a video response.
On a theoretical level, this paper contributes to the description of monologues as interaction by revealing what resources
speakers use to appeal to their audience. Speakers on vlogs clearly try to make their videos interesting to their audience, to
include themin the interaction by discussing relevant topics and addressing themdirectly, or to persuade themto engage in
two-way interaction via channels outside the actual video. The basic underlying assumption is that audience design is a
driving factor in this genre, even though the audience is largely unknown to the speaker and does not take an active part in
the spoken interaction. The rst moments of a video are highly relevant for the viewers decision whether to continue
watching it or not, and this represents an incentive for vloggers to make their openings particularly relevant to the audience.
The data consist of a corpus that currently contains 100 vlogs that were made and uploaded to the website YouTube
between 2006 and 2010. They vary in length between 29 s and 9 min, 32 s. They were recorded by 26 female and 32 male
speakers. The speakers whose vlogs are analyzed in this paper were asked for permission to use their video data for linguistic
analysis. Two users are mentioned by their online username only either because their contact data was not available, or they
did not reply to the request.
This paper introduces vlogs as a genre, followed by an overview of previous research on openings and on monologues.
Data analysis forms the main body of this study. The paper concludes with a discussion of the ndings regarding openings,
monologues, and CMC.
2. Vlogs as genre
Vlogs are a relatively newmultimodal genre of CMC, involving a speaker shooting video footage of him- or herself, which
is later uploaded onto the internet. In the production of a vlog, there are two phases: the taping of the material, during which
speech production takes place, and the editing of the video, during which the original sequences can be altered signicantly.
Depending on video editing skills and the vloggers choices, vlogs display various degrees of editing work. Thus, during the
editing phase, vloggers make decisions about every image and every sound they have recorded.
Most vlogs feature a single speaker. Usually, there are no signs of other peoples presence during the taping. Vlogs
instantiate non-scripted, non-institutionalized monologue situations, as opposed to fairly conventionalized situations such
as lectures, news reports, radio broadcast talk, sermons, etc. They display parallels to answering machine messages, which
are also generally unscripted but have, as a genre, developed conventions. With lectures, sermons, and news reports, there is
an obvious reversal in the order of appearance of content and medium/genre: clearly, the genre has been developed as a
means of spreading information effectively. Lectures are an established vehicle for the dissemination of topic bound
information fromone expert to a potentially large audience. Sermons work along the same pattern only in a different setting
and with more restrictions with regard to the topic. News reports are one realization of spreading information about recent
events, a need for which has long existed before they took on the form of TV or radio monologues. Vlogs, on the other hand,
developed rst as a new medium: as video hosting websites facilitated the exchange of video data, users started generating
original content to make use of this medium created by technological progress.
The vlog setting differs fromother monologue settings. There are signs of nervousness or hesitation present in vlogs, such
as laughter, that are not expected at the beginning of lectures, sermons, news reports, etc., which might, however, occur in
answering machine talk. I ascribe this primarily to the lack of conventions and the free choice of topic that are features of
vlogs, and to some degree of answering machine talk, but not of the other genres mentioned. Likewise, there are no temporal
restrictions for vlogs. Both the news and lectures have xed times for the beginning and the end. Sermons have an assigned
The Association of Internet Researchers (AoIR) suggests several criteria to be checked concerning internet research ethics (Ess, 2002). The following
arguments based on these criteria lead me to believe that using this data is ethically acceptable. The interaction under study takes place on a publicly
accessible website. This site offers its users to restrict access to certainother users, i.e. a private modus. None of the videos investigated inthis paper were set
to that modus. Furthermore, vloggers regularly show signs of their consciousness of the fact that they are in a public setting, as they encourage actions
(please rate, comment and subscribe) that create more trafc and thus attract more viewers to their channel. YouTubes terms of use state that Any
personal information or video content that you voluntarily disclose online (e.g., video comments, your prole page) may be collected and used by others.
Creators of the material (vloggers) whose content is cited in this article are (according to the information they give online) over 18 years of age; vlogs by
underage users are not explicitly discussed here, however, they are part of the data discussed in section 4. There are no ethically signicant risks (Ess,
2002:7) for vloggers whose content is discussed here: rstly, the research is concerned with the form of the language used, not the content, secondly,
opening sequences hardly represent sensitive topics.
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 815
slot in the xed order of events in a church service, resulting in restrictions with respect to length. However, vlogs can be
recorded at any time. So, lecturers, preachers, and news reporters have to adhere to certain time frames when delivering
their work. Vloggers, on the other hand, may postpone the job of recording, or they may shoot and re-shoot footage as often
as they want. Callers who leave answering machine messages have to make several choices which restrict them in certain
ways: rst, they need to decide to leave a message in the rst place; second, once they have decided that, they cannot choose
when to tape the message, thus they have no preparation time. This could be a reason for the development of patterns in
answering machine talk: callers might feel under pressure to produce coherent talk immediately and on record. A generally
accepted template along which to produce this talk is a helpful strategy.
Vlogs haveaspecial theoretical signicanceinthat theyconstituteagenresoyoungthat theconventions are still inaprocess
of negotiation. Theconditions avlogger faces aresimilar tothoseof atelevisionnews presenter. Inbothsettings, peopletalkinto
a generallyimmobile camera, whichrestricts speakers toa limitedarea not just tostaywithinthe cameraframe, but alsotostay
within the microphone sensitivity range. Also the cameras angle of view is smaller than a humans eld of vision that is
available in face-to-face conversation, and of course it is two-dimensional. Despite these similarities, the outcome of the two
genres is quite different. Thus, vlogs are signicantlyshaped by the social context: a vlogger is anindependent (usually, but not
necessarily) unpaid, private and untrained individual, while a TV news presenter is a journalist representing a broadcast
network. Vloggers usually reach between just a few (less than 100) and several thousands of viewers, with few exceptions
that run into the millions. News broadcasts on TV regularly attract a multiple of those numbers. Vlogs can be about any
topic, presented as personal opinions, while news broadcasts report recent events often assuming an objective stance. As a
result, vloggers speakandbehaveless formallythanTVpresenters, whoadhere toa certainformal etiquette. Vloggers laugh, tell
jokes, sing, vary in volume (from whispering to yelling), use taboo language and other informal vocabulary, etc.
Vlogs are distinct fromblogs, another asynchronous mode of onlinecommunication, inseveral ways, but the twogenres also
share certain features. Blogs are dened as frequently modied web pages containing dated entries listed in reverse
chronological sequence (Herring et al., 2004). They are predominantly made up of written language, though some contain
images or have video sequences embedded. All posts that are archived on one users page make up his or her blog. Vlog posts
constitute a vlog by themselves; the entirety of one users vlog posts is archived on their prole page (channel), also in
chronological order. Thus, a blogger posts on their blog, a vlogger posts a vlog. Many blogs can be characterized as single-
authored, personal diaries (Herring et al., 2005); vloggers, too, tend to discuss personal matters in monologic form. Both
vloggers andbloggers arealways incontrol of their posts, makingdecisions about thelength, topic, degreeof formalityetc. Blogs
cancontainlinks toother websites, for exampleas hyperlinks inthe text. Vloggers have the optionof editing hyperlinks intothe
video frame of their vlog, though this is one of the more recent technical features. In both genres, recipients can post written
comments (althoughthis function canbe disabled, or, for a number of blogs, it is not part of the blogger software (Herring et al.,
2005)); in the case of vlogs, one can post another video as a response. This appears underneath the video frame.
These similarities, and of course the derivation of the name vlog fromblog, suggest that vlogs constitute a sub-genre,
or type, of blog. Scheidt (2009), quoting Wikipedia, denes vlogs as weblogs that use video to tell their stories, often instead
of or in addition to text (p. 44). That classication seems sensible froma perspective that describes an internet genre ecology
emphasizing the similarities in purpose, authorial options and subsequent organization. From a linguistic perspective, I
consider essential the distinction between vlogs that use spoken language and moving images (including gestures, gaze,
shifts in pitch and volume , etc.) and written weblogs, resulting in a system that classies vlog and blog as sub-cases of a
shared genre.
As a working denition of vlogs I suggest a video sequence similar to a blog that a user (vlogger) shoots of him- or herself
talking into a camera and, after optional editing, uploads to the internet, where viewers can rate it and/or leave comments in
written or video form. I distinguish between vlogs and other videos such as sketches or how-to videos based on the main
focus on the content of the spoken language: vloggers predominantly tell stories or discuss topical events. The focus in a
sketch or how-to video is on the action shown in the images that are accompanied by spoken language.
3. Research on openings and research on monologues
Gold (1991) identies four sections in answering machine messages and describes the devices callers use to compensate
for their absent interlocutor. She observes that the greeting section (p. 246) contains elements borrowed from other
genres: greetings are reminiscent of the ritualized salutations in letters; information such as date and/or location and time
also occur in the letter frame. Further, she likens answering machine talk greetings to written language as both contain
features that compensate for the time/space gap between interlocutors. Gold describes self-identication as a feature of
answering machine messages, while she deems it pragmatically inappropriate in both face-to-face and regular telephone
conversation. Vlog openings contain features borrowed from other genres as well: some contain self-identication and the
date the video is posted. Vlogs and answering machine messages differ in important respects: vlogs are audio-visual rather
than audio data, and they are recorded with the knowledge that one is about to produce a monologue which can therefore be
planned in advance. Answering machine messages are unplanned recordings, by callers who expected to have a dialogue
with another person. Vloggers can edit their material after recording it; callers speaking on answering machine tapes cannot
do so in most cases.
Liddicoat (1994), expanding the research by Gold (1991) to cover both the pre-recorded message by the called party and
the spontaneously recorded message by the caller, points out that technological means of communication place constraints
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 816
on talk which lead to systematic modication of the available stock of conversational routines (p. 308). Interactive routines
that regularly take place in face-to-face or telephone interaction may be truncated or avoided (p. 307). Since vlogs, too, place
constraints on talk, it is to be expected that modications or a different usage pattern of routine phrases show up in them.
Aijmer (2007) analyzes two sets of answering machine message data, studying the use of routine phrases. She says of
answering machine message openings that a greeting phrase suggests that you are aware of addressing a person rather than
a machine. And a particular event may, but need not be accompanied by a routinized phrase. (p. 329) Thus, she
distinguishes between messages where callers imagine or construct a recipient and those where callers view the answering
machine only as a machine. The concept of an imagined or constructed audience helps explain why vloggers greet and
address their viewers. But it is not one to one applicable to the vlog situation since it is difcult to argue that vlogs without
greeting and address were not lmed with an imagined audience in mind. Further, the study illustrates that routine phrases
used in certain speech events or certain media develop over time. This suggests that for vlogs this process might be in its
early stages, and quite possibly it will take another few years until comparable conventions can be observed in vlog data.
Schegloff (1967) observes about the beginnings of phone calls that openings [. . .] provide [. . .] continuity into the body of
conversation and that they are not separable or dispensable (p. 51). He explains how the non-terminality of summons-
answer (SA) sequences helps establish the coordinated entry into abab sequencing in phone conversations and how SA
sequences align the roles of speaker and hearer (p. 122). In presupposing that conversation is a minimally two party
activity he implicitly states that monologues have a different status. Hence, there does not seem to be any ground for
summonses in monologue openings at all.
Furthermore, Schegloff states that a multiplicity of jobs . . . get done in openings, such as checking the availability of some
co-present person to engage in interaction, (re)constitution of the relationship of the parties, and negotiation of topic
(1986:113). Since invlogs none of these functions are necessaryfor the talk toproceed, a recognizable opening sequence seems
dispensable. The functions listed by Schegloff are non-negotiable in the vlog situation. The viewer can choose not to enter the
situationat all by not playing the video. However, that decisionis based onfactors relevant before the talk starts (the username
of the vlogger, the title of the vlog, its description, and the thumbnail image one sees on the website). It is not subject to some
joint activity in the talk. Therefore, we can assume that openings in vlogs must fulll other functions. Since viewers can also
decide to stopwatching the vlog at any moment, one principal function might be to raise the viewers interest and induce them
to watch the whole vlog. Similarly, Labov and Fanshel (1977) use the concept of reportability to describe storytellers
justications for their narratives to prevent rejection from the interlocutors on the basis of being ordinary.
Laver (1975, 1981) observes that the use of routine formulae is most dense in the marginal phases of interaction, i.e.
openings and closings, and that negotiations about social status and role are conducted with the help of formulaic phrases,
address terms and phatic communion in them. This serves to defuse the potential hostility of silence (p. 226). Phatic
communion can in this sense not take place in a vlog since there is no immediate negotiation. The opening sequence can
nonetheless be used by the vlogger to display status or social role.
Two studies that are concerned with openings in written CMC contexts are Rintel et al. (2001) and Waldvogel (2007).
Rintel et al. investigate openings in internet relay chat (IRC). They discuss the various channel entry phase (CEP)
progressions, made up of automatically generated messages by the server and the users own posts, which can take place
when a newuser joins a chat room. Waldvogel studies email openings in work place settings. As in this study of vlogs, there
are email openings that contain both greeting and term of address, neither of the two, or just one of the two. The choice of
combination (and the choice of words) reect back on social variables, e.g. gender, power structures within the work place,
seniority, and on the degree of harmony and support among staff.
Literature on monologues is often tied to the genre of classroom interaction. Studies rarely focus on the beginning of
monologues, but they describe other phenomena instead. Montgomery (1981) analyzes the structure of lecture monologues.
Clark (1996) denes monologue as a situation where one person speaks with little or no opportunity for interruption or
turns by members of the audience, clearly restricting the notion monologue to spoken language.
Smith et al. (2005), trace the strategies speakers use to establish referents in monologues in a research context when
introducing new characters in a retelling of a lm. They stress the importance of the quality of the assumed audience for
creating common ground. The assumed audience presumably plays an important role in vlogs, especially if there is a greeting
and terms of address. Discussing a type of monologue where the audience is present, Glick (2007) interprets a stand-up
comedians voicing of multiple parties to create a dialogic frame. The discussion of the monologue passages is not concerned
with reactions by the present audience (there is no sign of the audience in the transcription, either). Wells and Bull (2007), in
contrast, specically focus on afliative audience response in stand-up comedy and political speeches. They cite examples
where comedians invite reactions fromtheir audience by greeting and addressing themdirectly or stepping out of character.
Especially at the beginning of these routines, comedians use simple questions that elicit collective answers from the
audience. Clearly, comedians make use of the communicative channel between themand the audience with the expectation
of instant feedback. Though some of these rhetorical devices (greeting, term of address, simple question) are in some cases
present in vlogs, of course no immediate audience reaction is expected. Duman and Locher (2008) explore monologic
YouTube videos by the presidential candidates Obama and Clinton. The study focuses on the video exchange as conversation
metaphor that the candidates make use of while coping with the fact that technical limitations of YouTube do not permit the
immediacy/synchronicity of conversation (p. 205). This metaphor is created through the actual visual representationof the
candidates in the imitation of face-to face interaction (p. 202). In some of these videos, the candidates answer selected
questions that were posted as comments online which is a function of vlogs, too. These presidential candidates videos are
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 817
very similar to regular vlogs, though the words spoken are presumably very carefully composed by a whole team. Vlogs, too,
then might contain features that draw upon the source domain of conversation (p. 219).
4. Data collection and methodology
My examples were chosen to reect the practices that commonly occur in vlogs and not to highlight single occurrences of
selected phenomena. To be able to choose those recurring features, I collected 41 vlogs and analyzed them.
I entered the word vlog into the search function of the website YouTube, which insured that I collected only videos that
have the word vlog in the title. While this search did not turn up any of the vlogs that do not have the word vlog in the title,
it listed only those videos that are named, and thus probably considered, vlogs by the users. I collected monologic, English
data only, ltering out videos in other languages or those with several speakers present.
Transcribing the rst fewmoments of the vlog and listing their features provided me with statistics on which to base my
choice of examples. I noted down whether:
- there was an opening sequence containing written language and/or music resembling openings credits of a movie or TV
- it was shot indoors or outdoors
- there was a greeting
- there was a term of address
- there was self-identication
- the vlogger stated the date of lming
- there were discourse markers
- the vlogger used cuts as an editorial feature
Further, I noted down the vloggers age, the number of subscribers, and the country of residence, which can be found on the users
prole page (except sometimes the age, which some users do not give in these cases I lled in a rough estimate). To check
whether there was an effect of time difference throughout the English-speaking world on country of residence, I checked another
40 vlogs at a later time in the day (rst 41: 12 noon; second 40: 17 CET) for that feature. In my data collection I did not include
those vlogs that featured more than one speaker, since in this study I amconcerned with monologues without a present audience
The 41 vlogs yielded the following statistical data: 18 were done by female, 23 by male vloggers; 11 vlogs were
headed by a sequence containing written opening credits (and sometimes music), 30 started directly with a person
talking or images of other things (e.g. trees, a computer screen, a street scene); 8 vlogs were lmed outdoors, 32 indoors,
1 has audio only accompanied by a still image; 24 contain greeting terms, 17 vloggers do not greet their audience; 22
vloggers used terms of address for their audience, 19 did not (3 without greeting had a term of address, ve with
greeting had no term of address, 14 without greeting did not have term of address either, 19 with greeting also had a
term of address); 26 vloggers had no self identication, 5 had their (user)name in the opening credits or the title, 10
verbally stated their (user)name in the opening sequence; 6 vloggers used cuts to edit their videos, 35 did not; 24
vlogger had discourse or boundary markers in their opening sequences, 6 of which used them as their rst word, 17
used no such markers; 29 did not mention the date at all, 8 had it in the title or credits, and only four mentioned it
verbally; the youngest vlogger was 10-15, the oldest 50-60, the average age was 25; the lowest number of subscribers of
a user was 1, the highest 263,426, the average was 10,862; 62 (of 81) users are US Americans, 9 are British, 6 are
Canadians, 2 Australians, 1 each are French, German, and Dutch.
Thenext analytical stepwas toexaminemorevlogs bythe sameusers toestablishhowmanyof themrepeatedlymadeuseof
the same, established patterns in their vlog openings. 20 vloggers videos were further examined, resulting in the following
2 had only one video uploaded; 4 had no recognizable pattern in their opening sequences; 14 had identical opening
sequences in at least 50%of their vlogs. Of these 14, 8 always started their vlogs with a greeting and termof address. The most
common greetings are hi, hey, hello. The most common terms of address are everyone, everybody, guys, and YouTube and
youtubers. 3 vloggers regularly employed only one of the two features either term of address or greeting. The remaining 3
used linguistic markers (okay), the date, or had several patterns that they used equally often. This small sample suggests that
a majority of vloggers recurrently use identical or very similar opening sequences in their vlogs. Generally, there seems to be
a tendency for vloggers to employ the structure greeting plus term of address.
Based on these statistics the examples analyzed in this article were chosen. The selection represents the most common
elements of vlog openings. In the following there will be close analyses of vlog openings that contain greetings, terms of
address, self-identication, discourse or boundary markers, cuts, the date, objects or animals that are the subject of talk,
opening credits, and there will also be an example where none of these elements occur. Further, the analysis will illustrate
how vloggers who have established a pattern in their opening sequence deviate from that pattern.
The transcription conventions can be found in the appendix. Links to the videos are included in the references.
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 818
5. Types of openings in the data
The rst part of this section presents examples that contain the features of opening sequences listed in the previous
section in various combinations. It also describes examples that are marked by the absence of these features. The second part
deals with examples by users who have established their own recognizable opening sequence.
5.1. The features of opening sequences
5.1.1. Opening containing greeting, term of address, self-identication, linguistic markers
The rst example is by a male US-American in his late fties (all examples are headed by their original titles in bold print).
1 ((laughs)) {waving}
2 hey YouTubers its me Zipster.
3 okay,
4 so,
5 today I was thinking about um,
6 a couple of weeks ago me an um,
7 me an Sign543,
8 had to go to the DMV cause,
This vloggers opening sequences are started off by laughter and a stylized kind of waving. He greets his audience with the
informal hey and addresses them as YouTubers. This term of address reects the afliation with the video hosting
website (YouTube) that he assumes for his audience. At the same time, it subsumes a yet unknown number of people under
their only relevant common property, i.e. they are all users of YouTube. Therefore, this reects a certain degree of anonymity
inherent in some communication on the internet: in this case it is single-blind, in that at the time of production the speaker
does not know who will be watching the video, whereas at the time of reception, the viewer knows exactly who is talking.
Later, the vlogger sees at least howmany views his or her vlog has, and if there are comments or video responses, gets a sense
of who the viewers are.
Its me Zipster introduces the vlogger by his user name, which is also the name he gives to his character. Zipster is in
apposition to me, which implies that this is not the rst video he has posted, and that his viewers should recognize him.
The whole clause is reminiscent of the beginnings of answering machine talk, where a caller identies him- or herself. Since
in the case of vlogs, identication is facilitated by the accompanying images of the speaker, and since the navigation through
the internet/YouTube that is involved in playing a vlog raises strong expectations about whose vlog one is about to watch, I
assume that self identication might only be a secondary purpose of this clause. Rather, Zipster uses this recurrent bit in
most of his vlogs to coin his own catchphrase and thus establish a unique character.
The fact that Zipster borrows this sort of introduction from answering machine talk, or possibly radio or TV talk, reects
the lack of conventions for the genre vlog. This is an example of a vlogger who apparently feels the need to have an opening
sequence that is recognizable as such, and since there is no standard wording available, he provides a make-shift adaptation
from another genre.
In lines 3 and 4, there are two discourse markers okay and so that signal the transition from the introductory unit to
the main body of the vlog, which in this case begins with a narrative. Not only is there a shift topic-wise, but there is also a
change from an informal mode (laughter and waving), to a more serious tone (change in facial expression) that allows the
speaker to set up his story, leaving room for humorous build-up later on.
5.1.2. Opening containing greeting, term of address
Example 2 was posted by a male US-American in his mid-thirties.
Re: Who are you....Who, Who...Who, Who
1 hey Renetto.
2 uhm, {rubs his right eye with ngers}
3 lets see. {looks at piece of paper}
4 rst up.
5 yes I am a person.
6 no Im not a kid.
7 yes I have a job?
8 yes I have a life?
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 819
The signicant difference fromthe previous example is that here, a single, specic, real life person is being addressed (who is
most likely known to most viewers of this vlog, since this video was posted as a video response to a vlog by the user he is
addressing: Renetto). This speaker displays some nervousness through a gesture he makes (cf. line 2) while he bridges the
gap between the introduction and the topic with a lled pause/hesitation marker. He then continues in line 3 with the main
topic, which is headed by lets see. That introduces the answers to a list of yes/no questions which he apparently reads off a
piece of paper. Boththe addressing of a concrete personand the prepared topic outline facilitate the speakers performance of
the beginning of the vlog.
The title also points to the fact that the whole vlog is a unit of interaction, i.e. the answer to a question. Re: is
automatically added to the title of a video if it is uploaded as a video response. Thus this instantiates a type of interaction
where the basic unit is not a turn, but a whole video, which could of course be interpreted as a multi-unit turn.
5.1.3. Example containing term of address, self-identication, date
Example 3 was posted by a male American in his late fties.
Dollar, Paul Krugman
1 everyone,
2 its Peter Schiff,
3 this is Tuesday.. uh March sixteenth two thousand and ten.
4 well,
5 the Dollar was broadly weaker today,
6 uh,
This example starts with a term of address everyone (line 1), followed by self identication (line 2). Only 3 vlogs out of 41
contain a term of address but no greeting, cf. section on procedure. In line 3, the vlogger states the date of the video
production, mentioning the day of the week and, after a short hesitation, the date. The transition from these three elements
to the rst topic is instantiated by the discourse marker well (line 4). The rst three lines are the regular beginning this
vlogger uses to introduce most of his vlogs, which usually cover political topics. The date cannot only be found in the video
description that is automatically generated on the website, but also in the (written) title of this vlog. The additional mention
of the date in the spoken part would seemredundant if the only function was to informviewers of the date. As giving the date
is part of the vloggers xed routine, however it also stresses the topicality with which he discusses events. Moreover, it lends
the vlog the character of professional journalism.
5.1.4. Example containing linguistic marker, greeting, term of address, foregrounded object
Example 4 was posted by a female US-American in her fties. It illustrates two particular features: the use of a linguistic
marker as the rst word, and the use of an object to distract the attention.
Althouse #2
1 well-
2 ((laughs))
3 hi vlog fans,
4 uhm,
5 {puts on glasses}
6 8like the glasses,8
7 [{takes glasses off again}]
8 [8this time,8]
9 um I was uh,
10 yesterday.
11 uhm,
Before greeting the audience, the speaker uses well as the very rst utterance, followed by laughter. This is a boundary
marker indicating the beginning of the vlog. Combined with laughter, it may evince the speakers nervousness in front of the
camera, or her insecurity of how to start the monologue without instant feedback.
The marker well and the laughter do not only mark the transition from the situation before the vlog begins to the
greeting plus termof address; they also mark the fact that this particular vlogger has decided to make this the moment when
she starts recording, though she has no obligation whatsoever to do that.
The greeting and the termof address, both conventional in openings in multi-party conversations, showthat she transfers
conventions fromanother genre into a newgenre. Her choice of termof address, vlog fans (line 3), not only reects that she
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 820
does not knowher audience so that she has to identify them by their function as vlog viewers; it also makes the assumption
that these people are fans. This semi-humorous device seems to have the function of acknowledging the monologue
situation: by saying something that might conceivably be contested, albeit rather harmless, she reminds her viewers of the
obvious situation that they cannot react immediately.
In lines 58 follows an interactive feature in formof a question (the direct gaze into the camera and the intonation suggest
that this is a question), like the glasses this time. It is accompanied by the speakers putting on the glasses and taking themoff
again. Since the glasses are not used in their normal function to correct her vision, they serve some other purpose in this
sequence. Formally, they constitute the subject of a mini sequence that is inserted between the greeting (line 3) and the rst
topic of the vlog (line 9ff). The distraction caused by the glasses is presumably a device that the vlogger uses to postpone the
main body of talk in order to familiarize herself with the situation.
The added this time (line 8) suggests that there has been a previous occasion involving glasses. It can be speculated that
the vlogger has received comments about her glasses (written comments via YouTube or through other channels of
communication) that she acknowledges in turn. As in example 5.1.2, this reects the interactivity and recipient design of
vlogs. The same is true for the following example, where the vlogger explicitly mentions that she has received questions that
she now wants to answer.
5.1.5. Example containing linguistic marker, cuts
In the next passage, editing, especially cutting, plays an important role. This example is by a woman who, after posting
several other videos, is shooting her rst vlog. She is a US-American:
HappySlip #1
1 so instead of the usual HappySlip video,
2 I wanted to just connect with you guys,
3 and show you the girl behind the Slip.
4 {cut}
5 Im thinking that one of these vlogs is overdue.
6 just to answer some of the questions that you have,
7 {cut}
8 number one asked question is,
9 are you Filipino.
10 YES I AM. {gestures with paper roll in her hand}
The whole vlog shows no signs of hesitation, nervousness or other struggle with producing a monologue on camera.
However, a regular and salient feature is the frequent cuts, which probably helped eliminate all features that would have
rendered the vlog less uent.
The rst three lines, up to the rst cut, and lines 5 and 6, up to the second cut, introduce the vlog by way of mentioning its
purpose (to connect) and an outline of its contents (answer questions).
These lines are headed by the discourse marker so. Based on the intonation (level) and the inclusion into the intonation
unit that constitutes line 1, I assume that this so originally bracketed two units of talk. It is a marker of result rather than a
boundarymarker indicatingthebeginningof thevlog. Thus, thevlogger madeaneditorial decisionthat violates theconventions
of regular talk, creating a situation where viewers of the vlog get the impression that they have missed the beginning.
In lines 6 and 8, the vlogger mentions viewers questions, and in line 9 she quotes the question she receives most
frequently: are you Filipino. She subsequently answers that question in line 10. Thus, this example instantiates interaction
between a vlogger and her audience. Her taking up questions, quoting them as direct speech, and answering them clearly
indicates that vlogs are geared towards an audience. Discussing topics that are relevant to the audience is an incentive to
continue watching the vlog.
5.1.6. Example containing cut, linguistic marker
Example 5 is by the same vlogger that posted example 4.
Watching Hillary.
1 -nd I uh::
2 so I tried to do a little shopping,
3 didnt really get the shopping done,
4 uuuh didnt have any shopping to GET done.
5 I just didnt feel in the mood to shop,
6 I went to my favorite sto:re,
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 821
7 nothing.. nothing.. uh called out to me=
8 =as needing to be added to my... wardrobe.
9 ((laughs))
The rst line is headed by a discourse marker -nd, and it is aborted after the subject. Again, this marker indicates that there
was previous talk which fell victim to an editorial decision. In this case the editing of the video is foregrounded in that the
beginning of the word and is not even audible. Hence, the vlog starts in the middle of a word. Line 2 is then a completed
utterance, which is also headed by a discourse marker, so. This utterance is a repair of the incomplete and I uh:: in line 1.
Whether that repair took place because the speaker realized so was a more appropriate organizational marker than and
in that context, or whether she started out to say something completely different which she then discarded cannot be
determined. But still, the editors decision to keep the rst line in the video reects that she considers it a good place to start
the vlog, perhaps because it is a good entry into that particular unit or topic, even though viewers end up feeling that they
have missed a bit. Froman editors point of view, she might even have retrospectively assigned that whole unit and I uh:: a
new function, i.e. that of a boundary marker that marks the beginning of the vlog.
5.1.7. Example containing none of the elements
None of the elements that regularly occur in vlogs are present at the beginning of the following example (greeting, termof
address, linguistic marker, self-identication). Furthermore, no signs can be found that any previous part of the video footage
is missing. In other words, there are no ties to foregoing talk or footage. This example is by a woman who presents a comedic
vlog about the possibility of clouds as presidential candidate:
fuck all you cloud haters
1 what really pisses me off is that no one understands that clouds are the only possible good president there
could ever be.
2 no president.. in history has been able to oat through the sky?
3 okay?
4 neither Obama nor Clinton NOR MCCAIN.. are cumulus?
5 {cut}
6 theyre not stratus?
7 theyre not cirrus?
8 okay.. motherfuckers?
This passage starts with a complete grammatical sentence, including a wh- cleft and four clauses. There are no breath groups
or other signs of spontaneous spoken language. This may change in the lines afterwards. However, the opening sentence
sounds like the speaker either composed it beforehand, or even practiced it. This absence of features of spoken language
makes the speaker appear determined and strongly opinionated, which is in accordance with the angry expression on her
face as well as the exaggerated use of offensive language. Given the unrealistic subject of her vlog, it is obvious that the
speaker is not in a serious mode, but that she is displaying fake anger. As a means of stressing the (pretense) immediacy of her
concern, she chooses to skip the opening sequence.
5.2. Variations from established, personal openings
Established, idiosyncratic phrases or behavioral patterns are an important part of some vloggers routines. Zipster, cf. ex.
1, for example, uses the introductory phrase Hey YouTubers, its me, Zipster in almost all of the videos he posts online. He
deviates from his routine when acting as a different character, using another phrase. As of date, Zipster has uploaded over
840 videos to his prole page. Out of 20 randomly picked vlogs in which he appears as the character Zipster, 15 contain the
established opening. 4 contain a variation, and 1 does not adhere to the pattern at all.
The following example illustrates a seemingly unplanned alternate of his usual pattern caused by a distracting noise.
Often these atypical openings contain an allusion to the vloggers typical opening or drawon themas a resource of humor. At
times, the established introductory phrases are simply postponed to a later point in the video.
More random thanks
1 Zipster: [((laughs))]
2 voice: [yahoo mail,]
3 Zipster: yahoo mail, {bends over towards camera,}
4 Jesus Christ. {turns down speaker volume}
5 rstest thi
ng in the da
mn video.
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 822
6 hey YouTubers its me Zipster.
7 I gotta tell you.
8 I wasnt even gonna do a video tonight,
This opening sequence starts out as expected with laughter, but the usual sequence is disturbed. The interruptive voice in
line 2 is an automated sound alert that noties the user of this email provider that a newemail message just arrived: yahoo
mail. Zipster takes that up immediately by repeating the exact line yahoo mail, which is followed by Jesus Christ,
expressing annoyance at the disturbance. At the same time Zipster turns down the volume of his speakers to prevent further
incoming messages from interrupting him. The immediacy of his verbal and physical reactions suggests that this situation
has occurred before, and that Zipster is following a strategy developed in response to it. This sound alert occurs in other
videos by Zipster and is therefore known to regular viewers of his vlogs. Zipster even sometimes thematizes that alert and
the disruption it causes by addressing the voice (e.g. by answering thank you, Christine or shut up, Christine where he
personies the pre-recorded voice of an unknown person by giving it a name). Thus we can assume that there is a shared
history between Zipster and his viewers that he presupposes when he says rstest thing in the damn video in line 5. He can
rely on the viewers to understand who is talking and why, and that he nds that interruption undesirable. The formrstest
possibly goes back to the rst vlogs he posted, where his character was supposed to portray a mentally retarded person
whose use of ungrammatical forms and mispronunciations is very prominent. This and the laugh pulses in thing and
damn signal that he is in a less serious mode than his wording might suggest.
It is important to remember that vloggers have two phases of production: rst the actual uttering of their intonation units,
andsecondlytheeditingprocess. Thus, vloggers cangobackintimeandtakebackor alter what theysaidbycuttinganddeleting
scenes. Here, Zipster chooses to leave three intonation units (lines 35), that are clearly deviant fromhis usual pattern, in the
vlog. Theyshowhiminasituationwherehebrieyloseshistemper or pretends todoso. Onlyafterwards does heproceedwith
his regular introductory pattern, namely in line 6 hey YouTubers its me Zipster which he utters with a clearly altered voice.
The vloggers decision to leave this passage inthe vlog instead of cutting or reshooting the scene indicates that Zipster assumes
that something can be gained from it at the cost of being inconsistent with his well-established characteristic opening.
In the analysis of this vlog opening, but especially in the three intonation units 35, identity construction is a salient issue.
The viewers see an irascible Zipster who is prompted into swearing for a brief spell. This introduces humor (cf. laughter
pulses in line 5) as a strategy to mend the pattern and return to the usual opening sequence. At the same time, Zipster
establishes another recurrent (though unplanned) element of his vlogs, namely the prerecorded message spoken by a voice
he calls Christine. Instead of turning off the speaker before taping the vlog, he retains the element of surprise, which allows
himto digress and possibly introduce humor. Also, this makes the whole vlog look less structured and less professional. This
feature allows for interaction, and it introduces a second character with a certain will of its own, since it is not directly
manipulated by Zipster himself. A situation where one has to react spontaneously can help the production of an entertaining
monologue insofar as it makes it less monotonous.
Considering both levels of production allows for a further distinction. In this case, the interruption was not planned
(though not prevented either) at the level of speech production since email messages come in unexpectedly and hence also
the sound alerts. However, at the level of editing there was a clear decision to retain the interruption in the video. The next
example also illustrates a deviation from an established pattern.
The vlogger is a female US-American in her late twenties. She is preparing to watch a live video broadcast while lming
herself for the vlog. The phrase she regularly starts her vlogs with is hello everyone its Katie again.
On YouTube Live
(music sequence withimages of the vlogger, her dog, a painting, noodles; the user name k80blog! appears at the end in
orange letters)
1 okay.
2 so this YouTube live is gonna start in one minute.
3 and.. hello everyone its Katie again.
4 and this this uh YouTube live event,
5 of the uhm of our lives really.
6 uh it- it seems to be- the way YouTube has advertized it=
7 =its going to be... hisTORic.
8 uhm.
9 the wa:y theyre going to.. IMitate television=
10 =its really going to be historic.
This vlog starts with opening credits and music. The background images show entities that regularly appear in this users
vlogs. It reminds one of opening credits of TV series, where images of familiar characters and places are shown to a signature
melody with episode title shown on screen.
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 823
K80blog opens witha variation of her usual opening sequence in that the rst two intonation units make it seemlike she
is going to start the vlog without introducing herself or greeting the audience at all. All she puts in front of the main topic is
two markers okay and so. The rst, okay is a boundary marker. It marks the boundary between a pre-posed sequence
of music to the sequence that shows the vlogger during speech production. It is not a discourse marker according to
Schiffrins (1987) denition, which states that discourse markers bracket units of talk, since there clearly is no foregoing
Sinclair and Coulthard (1975), referring to classroom talk, note that the boundaries of what they call transactions are
typically marked by frames, a category which comprises ve items: okay, well, right, now, and good. They say about
teachers talk that a frame invariably occurs at the beginning of a lesson, marking off the settling-down time. In the
teachers monologic passages, these frames often co-occur with so-called foci, as Sinclair and Coulthards example
frame well
focus today I thought wed do three quizzes
They argue that this sort of topic introduction is typical of classroom discourse as opposed to conversation outside such
settings because of the distribution of control that allows a teacher to decide on a topic or an activity. Vloggers are in a similar
situation: as they are the only speakers, the choice of topic is completely up to them. Unlike the teacher in the classroom
situation though, the vloggers viewers can turn off the video at any time during the vlog. Thus, vloggers would probably do
well not to stress their control of topic too much. This way, the viewers sense of voluntary participation is preserved, a
feature lacking in classroom discourse. The downside of that control for both vloggers and teachers is the obligation to ll a
turn that often exceeds the usual turn length in multi-party conversations by far. Given these similarities, it is not surprising
that individuals in such situations develop similar strategies.
So, which is at the head of the second intonation unit (line 2), serves to reintroduce a topic that has apparently been
discussed before. This is indicated by the use of the demonstrative determiner this, which presupposes common
knowledge. It is a matter of speculation whether that knowledge derives from previous multi-party interaction or just from
the title of the vlog (On YouTube Live). The topic is an event that is central to many vloggers because they themselves,
namely the active users of the website YouTube, are its subject. K80blog uses a context which is the sumof experience of all
kinds of interaction that has taken place among the vlogger community about YouTube Live. Thus, on the local, sequential
plane, so links two units of talk. On a global level, or in this case a super-global level, it links back to discourse that might
have taken place in a different setting, through different media, where K80blog might have been just a passive listener etc.
The vlogger makes an implicit assumption about her audience, namely that there is a collective experience that enables the
viewers to understand what she is referring to.
The next line nally provides the element that usually comes right at the beginning of K80blogs videos: and.. hello
everyone its Katie again. It is headed by marker and, another discourse marker. The switch from the introduction of
the topic in lines 1 and 2 to the prefabricated opening in line 3 is accompanied by a shift in her facial expression. It
changes from a somewhat serious mode to an open friendly smile exactly when she utters and while slightly shifting
her head from right to left (viewers perspective). Thus, the shift from main topic talk to introductory intermission that is
verbally enacted by a discourse marker is underscored multimodally by shifts in expression and posture. The lexical
choice of and as shift facilitator (rather than so) is interesting in that it seems to be completely devoid of its
grammatical role (Schiffrin, 2001) as a conjunction, as it occurs between two elements that are structurally not just
different, but incommensurable.
Finally, in line 4 there is a shift back to the topic introduced in the rst two lines. Likewise, the facial expression gradually
goes back to a more serious mode. Again, she uses the discourse marker and to realize that shift, which, this time, ts better
into the syntactic context.
6. Conclusion
In this paper I have presented and analyzed various opening sequences in vlogs. I have shown how speakers manage to
deal with the self-imposed task of producing a monologue on camera, and especially how they handle the transition from
what happens before the vlog begins to the initial verbal expressions in the video. Speakers on vlogs use different strategies
in their openings, which can be verbal or non-verbal. These strategies include linguistic means such as recognizable patterns,
or the use of other media (computerized voice), non-verbal distractive devices (showing glasses), which can trigger side-
sequences, and video editing. Vlogs include interactive elements such as questions, greetings, and terms of address, which
are borrowed from dialogic genres or conventionalized monologues. These phenomena are evidence of audience design,
which underlies the production of most monologic media talk.
The use of these strategies results in a categorization of vlog openings. The main distinction is that between openings that
follow a users idiosyncratic pattern, and thus appear in the majority of that vloggers videos, and those that do not rely on
such a routine. As this is not a structural distinction (openings fromeither category could have the exact same structure, e.g.
greeting, term of address, self-identication), it requires studying several vlogs by the same user.
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 824
The following are my conclusions regarding monologues: Assuming that dialogue is the default setting for spoken
interaction the mode in which language is rst acquired and most frequently used monologue, in being reduced to the
productive part of dialogic conversation and stripped of the receptive part, should be more difcult, or at least different than
face-to-face conversation. The suspension of turn taking disallows all standard formulaic exchange patterns that involve
sequencing, unless the speaker uses incomplete forms, or impersonates both speakers. Thus, the observation that speakers
employ alternative, adapted strategies in monologues is not surprising. The xed introductory phrases I have presented
remind one of conversational openings from other genres (TV, radio, answering machine messages, etc.), which can be very
ritualistic as well. Editing, which constitutes the second phase of language production in vlogs, allows for strategies that
would be very difcult to realize in other monologic genres: cutting off parts of words or phrases, or moving elements to
initial position that, when they were rst uttered, served as brackets between two units. Monologues in vlogs also display
elements that are employed in other monologic genres: greetings and terms of address occur (cf. news reports), and so do
boundary markers/frames (cf. classroom talk, lectures). Furthermore, multimodal features such as shifts in gaze, facial
expression and posture support the vloggers verbal output.
Regarding openings, one can conclude that in vlogs, standard conversational openings such as question/answer- and
summons/answer-pairs do not work. On the other hand, alternate conversational elements such as terms of address and self
identication work and occur. What is interesting with regard to openings in general is that some vloggers try to
conventionalize themand turn theminto routines. Others completely deny a need for an introductory phase, so that they do
not establish personalized routine formulae. Why there are different strategies to begin a vlog, and which one a vlogger
chooses, ultimately reects pragmatic and sociolinguistic variation: speakers use different degrees of formality, choose
different registers, address different imagined audiences, discuss different topics, have different aims, have different socio-
economic backgrounds, etc. Some vloggers possibly have introductory phases to prepare their viewers for topics discussed
and create a sense of completeness of the vlog as a unit, to justify their discussing it in the rst place (this also occurs at the
ends of vlogs with phrases like I just thought Id share this with you). Viewer feedback in forms of comments and ratings
probably play a role when a vlogger designs their vlog content so that it passes as tellable. Others, by leaving out a formal
introduction, perhaps seek to create a sense of immediacy.
The fact that some users do have an introductory phrase, even though negotiation of factors such as availability for
interaction, identication, social status, and alignment of participants does not take place in vlogs, suggests some functions
of vlog openings. It reects that some vloggers have an imagined audience in mind that they are addressing, which might be
helpful in the process of producing a monologue to a camera. It also acts as incentive for the audience to understand the vlog
as part of an asymmetric, asynchronous interaction, inviting the viewers to respond via the communicative channels that
YouTube offers. An interesting avenue for future research would be to compare the impact of vlogs with and without opening
phrase, in terms of the interaction they initiate. Such research could be conducted for example by comparing the number of
views, comments, video responses and the ratings. However, since these features reect back on the complete video, not just
the opening sequence, it would be very difcult to isolate that variable.
Regarding the medium vlog as part of CMC, we can say the following: Why people engage in an activity that seems
more demanding than other pastimes is a matter of speculation. An initial explanation is that these monologues reaching
a potentially large audience are an ideal ground for representation of the self, creation of a persona, or generally the
construction of identity. Many vloggers state in metareferential comments about their hobby that they like the
interaction and the community. Clearly, the written comments users can make below a video and other features of the
website are used extensively by vloggers and viewers to communicate and create identities. The fact that this happens
online allows users to establish relationships with people that are physically so far distant that they would generally not
meet ofine. As part of these relationships, vloggers and their viewers develop recurrent topics and thus establish
common knowledge. Hence, in vlogs there is exophoric reference to both common world knowledge and very specic
community based knowledge.
More research on vlog openings is required to illuminate what functions these sequences fulll. Such research should
reveal more about the use of terms of address, which were only touched upon in passing here. Also the occurrence of silences
and pauses at the beginning of vlogs should result in interesting insights into monologues (cf. silence/pauses in lectures).
Observing vlog openings over time will also reveal whether there is a tendency to establish general conventions for the
genre, such as xed phrases that are used by all vloggers. Research on gestures in monologue could surely be further
expanded with the present video data as well. Further research might focus on endings of vlogs and complete vlogs to reveal
opening sequences
i a) sequence containing a combination of any of the following
features: opening credits, greeting, term of address,
self-identication, linguistic markers, cuts, distractive devices
! i b) comparing several vlogs by the same
vlogger reveals the class of prefabricated,
established personal openings
i c) once a user has established their personal
opening phrase, they can use variations on it
ii vlog opening containing none of the above listed elements
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 825
for example how these spontaneous monologues are structured. Similarly, this research could shed light on how the use of
terms of address and direct address mirrors expectations vloggers have about their audience. Clearly, much work remains to
be done on this fairly new, audio-visual CMC genre.
Appendix A. Transcription conventions
shes out. Period shows falling tone in the preceding element; suggesting nality.
oh yeah? Question mark shows rising tone in the preceding element; cf. yes-no question intonation
so, now, Comma indicates a level, continuing intonation; suggesting nonnality.
bu- but A single dash indicates a cutoff (often with a glottal stop); including truncated
intonation units.
DAMN Capitals show heavy stress or indicate that speech is louder than surrounding discourse.
8dearest8 Utterances spoken more softly than the surrounding discourse are framed by degree signs.
says oh Double quotes mark speech set off by a shift in the speakers voice.
(2.0) Numbers in parentheses indicate timed pauses.
If the duration of the pauses is not crucial and not timed:
.. a truncated ellipsis is used to indicate pauses of one-half second or less.
... An ellipsis is used to indicate a pause of more than a halfsecond.
ha:rd The colon indicates the prolonging of the prior sound or syllable.
<no way> Angle brackets pointing outward denote words or phrases that are spoken more
slowly than the surrounding discourse.
>watch out< Angle brackets pointing inward indicate words or phrases spoken more quickly
than surrounding discourse.
[and so-]
[WHY] her? Square brackets on successive lines mark beginning and end of overlapping talk;
multiple overlap is marked by aligning the brackets.
=then Equal signs on successive lines show latching between turns of different speakers;
they can also indicate that the turn of one speaker continues after e.g. backchannels
of interlocutors.
H Clearly audible breath sounds are indicated with a capital H.
.h Inhalations are denoted with a period, followed by a small h. Longer inhalations are
depicted with multiple hs as in .hhhh
h Exhalations are denoted with a small h (without a preceding period). A longer
exhalation is denoted by multiple hs.
.t Alveolar suction click
( ) In the case that utterances cannot be transcribed with certainty empty parentheses
are employed
(hard work) If there is a likely interpretation, the questionable words appear within the parentheses.
/ / slashes are used for phonetic transcriptions
((laugh)) Aspects of the utterance, such as whispers, coughing, and laughter, are indicated with
double parentheses.
{points at board} Nonverbal behavior, such as movements and looks, and video editing, such as cuts,
are indicated with braces.
Numbering conventions: Number each intonation unit consecutively (e.g. from 1 to n).
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 826
Aijmer, Karin, 2007. Idiomaticity in a cultural and activity type perspective conventionalisation of routine phrases. In: Skandera, Paul (Ed.), Phraseology
and Culture in English (Topics in English Linguistics). de Gruyter, New York, pp. 323349.
Clark, Herbert H., 1996. Using Language. CUP, Cambridge.
Duman, Steve, Locher, Miriam, 2008. So lets talk. Lets chat. Lets start a dialog: an analysis of the conversation metaphor employed in Clintons and
Obamas YouTube campaign clips. Multilingua 27, 193230.
Ess, Charles, and the AoIR ethics working committee, 2002. Ethical decision-making and Internet research: recommendations from the aoir ethics working
committee, Approved by AoIR, November 27, 2002.
Garrod, Simon, Pickering, Martin, 2004. Why is conversation so easy? Trends in Cognitive Sciences 8 (1), 811.
Glick, Douglas, 2007. Some performative techniques of stand-up comedy: an exercise in the textuality of temporalization. Language & Communication 27
(3), 291306.
Gold, Ruby, 1991. Answering machine talk. Discourse Processes 14 (2), 243260.
Herring, Susan C., Scheidt, Lois Ann, Wright, Elijah, Bonus, Sabrina, 2004. Bridging the gap: a genre analysis of weblogs. In: Proceedings of the Thirty-seventh
Hawaii International Conference on System Sciences (HICSS-37). IEEE Press, Los Alamitos, CA.
Herring, Susan C., Scheidt, Lois Ann, Wright, Elijah, Bonus, Sabrina, 2005. Weblogs as a bridging genre. Information Technology & People 18 (2), 142171.
Horton, Donald R., Wohl, Richard, 1956. Mass communication and para-social interaction: observations on intimacy at a distance. Psychiatry 19, 215229.
Labov, William, Fanshel, David, 1977. Therapeutic Discourse: Psychotherapy as Conversation. Academic Press, New York.
Laver, John, 1975. Communicative functions of phatic communion. In: Kendon, A., Harris, R., Key, M. (Eds.), The Organization of Behavior in Face-to-face
Interaction. Mouton, The Hague, pp. 215238.
Laver, John, 1981. Linguistic routines and politeness in greeting and parting. In: Coulmas, F. (Ed.), Conversational Routine: Exploration in Standardized
Communication, Situations and Prepatterned Speech. Mouton, The Hague, pp. 289330.
Liddicoat, Anthony, 1994. Discourse routines in answering machine communication in Australia. Discourse Processes 17, 283309.
Montgomery, Martin, 1981. The Structure of Monologue In: Coulthard, M., Montgomery, M. (Eds.), Studies in Discourse Analysis. Routledge, London, pp.
Rintel, Sean, Mulholland, Joan, Pittam, Jefferey, 2001. First things rst: internet relay chat openings. Journal of Computer-mediated Communication 6 (3).
Scheidt, Lois, 2009. Diary Weblogs as Genre. Retrieved from: (last access
June 29, 2010).
Sinclair, John, Coulthard, Michael, 1975. Toward an Analysis of Discourse: the English Used by Teachers and Pupils. OUP, Oxford.
Smith, Sara W., Noda, Hiromi Pat, Andrews, Steven, Jucker, Andreas H., 2005. Setting the stage: how speakers prepare listeners for the introduction of
referents in dialogues and monologues. Journal of Pragmatics 37, 18651895.
Schegloff, Emanuel A., 1967. The First Five Seconds: The Order of Conversational Openings. Unpublished PhDdissertation, University of California, Berkeley.
Schegloff, Emanuel A., 1968. Sequencing in conversational openings. American Anthropologist 70, 10751095.
Schegloff, Emanuel A., 1986. The routine as achievement. Human Studies 9, 111151.
Schiffrin, Deborah, 1987. Discourse Markers. CUP, Cambridge.
Schiffrin, Deborah, 2001. Discourse markers: language, meaning, and context. In: Schiffrin, D., Tannen, D., Hamilton, H. (Eds.), The Handbook of Discourse
Analysis. Oxford, Blackwell, pp. 5475.
Tolson, Andrew, 2005. Media Talk: Spoken Discourse on TV and Radio. Edinburgh University Press, Edinburgh.
Waldvogel, Joan, 2007. Greetings and closings in workplace email. Journal of Computer-Mediated Communication 12 (2), In:
Wells, Pam, Bull, Peter, 2007. From politics to comedy: a comparative analysis of afliative audience responses. Journal of Language and Social Psychology
26 (4), 321342.
Data sources:
Example 1: THIS is STUPID!!!
Example 2: Re: Who are you....Who, Who...Who, Who
video removed by user
Example 3: Dollar, Paul Krugman
Example 4: Althouse vlog #2
Example 5: HappySlip #1
Example 6: watching Hillary
Example 7: fuck all you cloud haters
Example 8: More random thanks
Example 9: On YouTube Live
Maximiliane Frobenius is a research assistant and doctoral student at the Department of English Linguistics at Saarland University, Saarbrcken, Germany. Her
eld of research is media English, especially CMC and language on television.
M. Frobenius / Journal of Pragmatics 43 (2011) 814827 827