WHATSAPP VOICE MESSAGING AS AN EMERGENT DIGITAL PRACTICE: A MULTI-METHOD ANALYSIS

In this contribution, I describe the development process of my Masters thesis in which I investigated WhatsApp Voice Messaging through a multi-method approach, using Conversation Analysis in combination with ethnographic methods. This exploratory study allowed me to classify WhatsApp Voice Messaging within Susan Herring’s Faceted Classification Scheme for Computer-Mediated Discourse (2007). In the first part, I introduce Voice Messaging and provide some background for the reader to understand the elements founding the multi-method I used. Next, I describe my methodology step-by-step in a comprehensive way. Then I take the reader through a detailed analysis of three selected findings, assembling complementary results. Finally, I list the limitations of my methodology and provide some orientation for future research.


Introduction
In 2013, the possibility to send Voice Messages on the world-famous Instant Messaging online platform WhatsApp was added to the system. As more and more people use this functionality, it appears to be a great success. After defining a recipient, Voice Messaging consists of the user pressing a button and holding it for any amount of time during which their voice is recorded. The Voice Message is sent to the recipient as soon as the user lets go of the button (see Figure 1). The use of WhatsApp Voice Messaging positions the speaker in a hitherto unknown digital interactional context which raises many questions, ranging from the user's management of non-co-presence, to the possibility of classifying such messages in current linguistic research.
This contribution focuses on the development of the methodology I implemented in my mémoire thesis, which aimed at describing and classifying WhatsApp Voice Messaging in the existing body of linguistic literature. I characterised WhatsApp Voice Messaging as an emergent type of Computer-Mediated Communication (CMC) since it appeared as an innovative digital interactional setting.

Getting started
I set out to gather all previous research that could be relevant to the study of WhatsApp Voice Messaging -which involved an extensive literature and internet search. Yet I realised that, namely because of how recently it came about and because it involves spoken online behaviour, previous research had indeed seldom focused on online communication platforms with similar features. Though it might resemble the form of a message that one leaves on an answering machine, notably as to the asynchronicity of talk, non co-presence and the spoken aspect of that communication medium (see Gold 1991;Alvarez-Caccamo & Knoblauch 1992;Buzzanell et al. 1996), this precise form of CMC is increasingly intriguing insofar as Voice Messages have, to this day, not received any direct attention from a linguistic point of view. Indeed, "CMC delivered through channels other than text […] calls for systematic examination" (Herring 2011: 4). I therefore concluded that I was dealing with a very specific "niche" of linguistic study. I consequently grasped that my research would be exploratory in this field, and it had to take root in an authoritative framework in order to aptly situate WhatsApp Voice Messaging in the broader field of CMC. Early in my research, I came to recognise the importance of Susan Herring's instrumental Faceted Classification Scheme of Computer-Mediated Discourse 1 (2007). This classification helps to understand and approach emerging ways of interacting online, and therefore to give a place to be for the yet unheard of WhatsApp Voice Messaging in this field.

Herring's classification
In a first distinction lies the assumption that in order to classify all types of CMC, one must take into account that computer-mediated discourse is subject to "two basic types of influence: medium (technological) and situation (social)" (Herring 2007: 10). Thus, the classification is divided into two equally important and interesting collections of factors, which are each relevant either to the medium or to the situation of communication.
Medium factors "attempt to discover under what circumstances specific system features affect communication, and in what ways" (Herring 2007: 11). They are: Synchronicity; Message transmission; Persistence of transcript; Size of message buffer; Channels of communication; Privacy settings and Message format. Situation factors consist of collecting contextual information such as the relationship between participants, the topic covered in communication, and so on. The use of "situation factors assumes that context can shape communication in significant ways, although it does not assume that any given factor is always influential" (ibid.). They are: Participation structure; Participant characteristics; Purpose; Topic/theme; Tone; Activity; Norms and Code.

Theoretical background
Although no previous research addressed WhatsApp Voice Messaging directly, my preliminary investigation revealed that several of its characteristics had been hinted to in a number of past publications. Without going into detail here, my theoretical framework stems from several interconnected bodies of research.
These include publications on language in Walkie-Talkie interaction or in answering-machine messages by Szymanski et al. (2006: 393), who commented on the "remote state of incipient talk" enabled by the technological affordances of Push-to-Talk radios, or by Keith W. Ross (2003) who wrote a 'Personal Account' of what he calls "asynchronous voice", which corresponds to Voice Messaging. Interestingly, more than a decade after the latter hypothesized that "when a technology platform enables communicating with asynchronous voice to become as effortless as communicating by telephone or by email, asynchronous voice just might someday attain the coveted killer app status" (Ross 2003: 74, my emphasis), this development has in fact taken place. Susan Herring divides the study of language into four "domains": structure, meaning, interaction and social behaviour. As it is significant for the study of Instant Messaging, I decided to focus my attention on the domain of interaction (see Table 1).  Herring (2004: 18) Table 1 shows that the level of interaction is particularly relevant to the present research since it explores linguistic phenomena in, and attitudes towards, WhatsApp Voice Messaging while using both Conversation Analytical and ethnomethodological approaches.
Finally, and probably most outstandingly, Christopher Jenks (Brandt & Jenks 2013;Jenks 2014) has been active in, quite literally, giving a voice to CMC through the creation of a new investigative mode: Computer-Mediated Spoken Interaction (CMSI). Research in this field is currently aimed at the use of synchronous CMSI in educational contexts. Although its asynchronous facet -which WhatsApp Voice Messaging is part of -"is an important area of investigation" (Jenks 2014: 36-7), it has not yet received any attention. Jenks' method involves Conversation Analysis and uses a specific set of transcription conventions, which I chose to employ in this study.
With this theoretical background, I decided to design my mémoire around a combination of two complementary methods: a more general and qualitative approach of WhatsApp Voice Message users, as well as a more precise study of the conversation itself. Using a multi-method approach allows to broaden theoretical premises and is moreover valued in today's academic culture. I aimed to bring together the results and observations from complementary methods in order to classify WhatsApp Voice Messaging as an emergent type of CMC in an empirically informed manner.

A multi-method analysis
To carry out my study in a systematic way, I designed three research questions aimed toward the competent and comprehensive study of WhatsApp Voice Messaging. In order answer the first question, I purposely designed for this study a qualitative ethnographic questionnaire and a technolinguistic biography (see Page & Barton 2014 for a detailed description of these methods) and conducted it with ten participants. For the second one, five to six minutes of Voice Messages were collected from those participants in order to study how they use Voice Messages from a Conversation Analysis (CA) point of view. Using both of these methods together helped to approach and discuss the domain of "interaction" (see Herring 2004: 18) in this particular CMC setting in a focused manner. Finally, answering the third question aided by Susan Herring's (2007) Faceted Classification-Scheme for Computer-Mediated Discourse, shed light on the place that WhatsApp Voice Messaging occupies in the field of linguistic research.
Once the structure of my exploratory study was clear, I duly completed and submitted a research ethics clearance form provided by my supervisor. I began my data collection as soon as I was granted approval for my project.

An Ethnographic Questionnaire and a Technolinguistic Biography
A user-based approach is most likely the best way to recognise WhatsApp Voice Messaging as an emergent form of Computer-Mediated Communication. As explained in Page and Barton's Introduction to Social Media (Page & Barton 2014), ethnographic approaches "provide a coherent way of thinking about language and the Internet" (Page & Barton 2014: 104), insofar as "it is important to see people's perspectives in any situation and so to provide an insider's view" (ibid: 108).

The ethnographic questionnaire
This prompted me to design an ethnographic questionnaire specifically for this study. Each of the factors of Herring's classification were addressed by at least one of its 32 questions (see Annex 1). To give an example, the factor "message transmission" -i.e. "whether or not simultaneous feedback is available during message exchange" (Herring 2007: 14) -was approached with the following question:

How do you feel about not hearing the other's reactions when you are saying a Voice Message? (Annex 1, Question 2)
Among others, the participants admitted feeling "stupid", "dumb" or even "weird" when they were producing a Voice Message, thereby hinting to the attitude towards this particular interactional context. The answers collected thus allowed to describe attitudes of different users in an empirically driven manner and to give perspective on factors of Herring's classification.

The technolinguistic biography
After answering the ethnographic questionnaire, the participants provided a quick technolinguistic biography. This consisted of answering several questiontypes (see chapter 7 of Page & Barton 2014 for more details) such as -life-history questions -When did you acquire your first smartphone? -a day in the life questions -Describe your use of WhatsApp Voice Messaging in a normal day?
-prompting participants to reflect on transitions -When did WhatsApp Voice Messaging become part of your everyday online practices? (adapted from Annex 2) With five additional open questions of this type, the interview led to freer conversation and interesting openings in the discussion. The open-endedness of the time spent with the participants consequently enriched the data collection. Adding the technolinguistic biography to the ethnographic approach was backed by the notion that "we need to learn about what [users] know and what they do" (Page & Barton 2014: 130) in order to profile their attitudes toward a certain practice. In turn, analysing the answers of all participants allowed to discuss the actual position that WhatsApp Voice Messages occupy in the realm of CMC. I shall now detail how I went about this analysis.

Set-up
The participants in this study were mostly people I previously knew (friends, and friends of friends). I met several participants by discussing my research with acquaintances that put me into contact with new participants. Although this qualitative analysis does not purport to the general representativeness of its results, participant selection was nonetheless achieved with the acceptance of five male and five female participants, aged between 22 and 26 years.
Before the beginning of the interview, participants were asked to read and sign a research consent form and a research information sheet. These documents tackle all relevant ethical issues as well as privacy concerns that could be raised by participating in this study. In these forms, it was made clear that they would give me access to 5 to 6 minutes of Voice Messages and that the interview would be recorded. The collected interviews and Voice Messages were systematically given an arbitrary label chronologically and by gender, such as for instance: 'S1F1' is the first participant (S1) and the first female participant (F1), and 'S6M3' is the sixth participant (S6) and the third male participant (M3). For privacy purposes, all participants will be referred to by their labels throughout the study.

Data collection
During the interview, I asked the questions in an interchangeable sequence depending on the direction of the participants' responses to foster the conversational nature. The ethnographic questionnaire was therefore semistructured. Had it been structured, the participant's contributions might not have been as enthused and spontaneous as with this format. The unconditional acceptance of the participant's discourse was undoubtedly positive for the general atmosphere of the interview. The recording was then systematically analysed by rehabilitating the medium or social nature of the questions themselves.
Interviews were recorded either with the 'Dictaphone' app of a smartphone or, on one occasion, with the recording mode of QuickTime Player on a computer. The recording device was placed on the table that separated the researcher from the participant to enhance sound quality. When the recordings were done, they were labelled immediately and sent to a personal computer, where a backup was kept and where they could be consulted for transcription purposes.
Apart from certain interviews, which were held in my apartment with the more familiar participants, most interviews took place in public spaces such as in the cafeteria or my office in the University of Lausanne, and eventually in the emblematic 'Café Romand' in the centre of town. When the recording was already underway, participants were always addressed informally and the first question thus emerged from an open conversation, which added to the authentic feel of the interviews. Parasite interactions with waiters or passers-by eventually occurred, but were never disruptive enough to significantly influence the interactional flow, and hopefully neither the collected data.
All interviews were approximately an hour long. I listened to each of them and transcribed them in a large Excel spreadsheet listing each and every question and answer. Since each question was put into relation to a certain medium or social factor of Herring's classification, the answers were conveniently put back into order, followed by the answers to the technolinguistic biography.
Following these analyses pertaining to users, in the next section, I concentrate on actual Voice Messages that were collected from the participants through the lens of Conversation Analysis.

A Conversation Analytical (CA) approach to WhatsApp Voice Messaging
The actual online behaviour of WhatsApp Voice Messaging users is examined through a Conversation Analysis (CA) investigation. I transcribed several extracts of Voice Messages in order to find, as Conversation Analysts would say, "the machinery, the rules, the structures that produce and constitute [the] orderliness" of social actions (Psathas 1995: 2) when they are mediatised by WhatsApp Voice Messaging.

Data collection
As previously mentioned, I collected five to six minutes of Voice Messages from each participant. Mostly, participants sent me the relevant Voice Messages directly through WhatsApp and I transferred them to my personal computer and immediately deleted all identifying information.
The way in which the participants use language in their Voice Messages seems to be largely similar to everyday physically co-present face-to-face conversation. Yet, when examined closely, some structures of interaction appear to be altered. I listened attentively to the entirety of my corpus of Voice Messages several times and took notes when attention-grabbing extracts occurred. I then selected 42 passages ranging from 5 to 30 seconds for further scrutiny and uploaded them to IMPACT, a "Tool for Transcribing and Commenting on Oral Data, for Teaching, Learning, and Research" (Jacquin 2015) that is used by linguists at the University of Lausanne for research related to Conversation Analysis.
On IMPACT, I transferred my extracts to a file that was specifically created for my thesis. Next, I fashioned seven category labels -such as "opening", "hesitation", "question", etc. -and I applied one or more of them to each extract to point at the reason why they are under linguistic scrutiny. A thorough exploration allowed me to finally select 12 of the most expressive extracts that cover each of these categories for transcription in IMPACT and further analysis.

Transcription
The IMPACT tool allows to reduce the speed of the uploaded sound sample, and the researcher can thus conveniently re-listen to extracts second by second, over and over again. This greatly facilitates the transcription process and allows to discern micro-level phenomena in the considered extracts. Just like Christopher Jenks' Chat-Room Interactions (Jenks 2014), WhatsApp Voice Messages are asynchronous, which orients attention while transcribing away from phenomena of overlapping, simultaneous speech or contiguous speech that only occur in synchronous interactional contexts. One can discover IMPACT's transcription interface on Figure 2 below.

Figure 2. Interface of the IMPACT transcription platform
On the left side of Figure 2, the precise extract name: S6M3-4.42-53 appears. This refers to seconds 42-53 of the fourth Voice Message collected from subject S6M3 (sixth subject, third male subject). The private status ('état: privé') of this page, the category labels ('étiquettes') that were attributed to the extract as well as the sound file ('source') appear under the extract name. The transcription of this specific extract appears on the right side of Figure 2.
The transcription conventions used in Computer-Mediated Spoken Interaction concentrate on "capturing the highly granular nature of online spoken communication [that] is crucial to conducting rigorous CMC research, as micro features of talk are central to meaning construction in CMSI environments" (Jenks 2014: 43). As the transcription conventions used are slightly different from those usually implemented on IMPACT, it must be made clear that combining the most relevant transcription platform and transcription conventions for this study is a conscious choice (the detailed transcription conventions that were used for the selected extracts appear in Annex 3).

Analysis preparation
I subdivided my subsequent analysis into seven sub-sections. Each subsection is addressed in turn with the help of one or more excerpts taken from transcriptions as they appear in IMPACT. They are complemented by an English translation for the reader to have an idea of the content. The seven sub-sections that characterize salient linguistic phenomena that emerge in Voice Messaging are the following: It is evident that some of the extracts that were selected to illustrate one or another sub-category eventually overlapped with others. For instance, a closing sequence in which the speaker covered several topics could be categorized for commentary in sub-sections B. and/or G.
Next, I turned to the broader perspective of where WhatsApp Voice Messaging fits in the field of linguistic research.

A Place to be for WhatsApp Voice Messaging
In this section, I proceed towards a discussion of the particular status taken up by WhatsApp Voice Messaging in the realm of CMC, thanks to the various samples of collected data, which were scrutinized through the lens of distinct theoretical approaches. Therefore, knowing that all the material that I collected up to this point informs us in a certain way on the place we can attribute to WhatsApp Voice Messages in Herring's classification of Computer Mediated Communication, I therefore set out to characterise it in an empirically informed manner.

The synoptic table
The central contribution of this chapter is a synoptic table, which is greatly useful to characterise WhatsApp Voice Messaging by means of Herring's factors. The synoptic table (see Table 2) gives an overview of the components and subcomponents of Herring's medium and social factors, as well as the methods I employed to deal with each specific criterion in regard to WhatsApp Voice Messaging.
The right-hand column of Table 2 summarises the broad trend of participants' answers to the questions related to the components of each factor. This table enabled me to detect which factors are straightforward and which would need further scrutiny. Moreover, since Herring explicitly states that her classification is "open ended" and that "additional factors can be added as justified by evidence that they affect online discourse" (Herring 2007: 11), I incorporated at the bottom of the table three "other observations" I made while analysing my data.

The discussion and analysis
From there on, I set out on an in-depth discussion and analysis of my data in order to put forth the constructive and functional values of Herring's classification when applied to a new form of CMC, and to aptly adapt and redefine some of its characteristics. I brought together a number of theoretical inputs with other In the next section, I give more precise view of how the results, discussion and analysis were presented in my 'mémoire' through three sample sections.

Result samples
Because of space restrictions, I decided to summarise the results I obtained for one medium factor, one situation (social) factor and a chosen "other observation" I came to through the methodological design of my study.

Medium factor: persistence of transcript
This factor (1.c in Table 2) relates to "how long, relatively speaking, messages remain in the system after they are received" (Herring 2007: 15), which is quite straightforward when it comes to this Instant Messaging platform. Unless the user deletes messages, they are kept in the system and can be consulted at all times. The ethnographic questionnaire demonstrated that participants appeared to be aware of this factor as some admitted being self-reflexive with regard to the positive or negative emotivity or even the intimate nature of what they said in their Voice Messages. Answers given were for instance: "Never complain [or get angry] in a Voice Message" since it is not a constant state, "leaving an audio trace of it is not cool" (S8M4) "You never know where, when and who will hear what you say" (S7F4) Indeed, they were aware that the persistence of their Voice Message through time could lead to future misinterpretations based on context. Furthermore, listening multiple times to a Voice Message that was received was deemed useful when it came to recollecting the contents of one that included extensive information. On a different note, re-listening to a sent Voice Message was mainly perceived as an egotistic assessment of one's own performance. These results all pointed to the fact that producing a Voice Message is thus not only subjected to the constraints of orality, but also to the consequences that a re-listening can cause.

Social factor: topic
The topic (2.d in Table 2) on the exchange level refers to "what participants are actually talking about" (Herring 2007: 20). The ethnographic questionnaire and the CA approach both allowed to study the dynamics behind the selection and the shift between topics of interaction. Mainly, participants admitted using Voice Messages to 'talk about life' and to 'tell jokes', which highlights the private and a fortiori light-hearted component of the contents sent through this medium.
Remarkably, it was shown that one Voice Message corresponds neither strictly to one turn in conversation, nor to addressing a single topic. One Voice Message is indeed rarely mapped onto one turn-construction unit, as for example in Extract 1 below.

Extract 1
Quite clearly, Extract 1 consists of two parts: answering a question, and then asking a question in relation to a previous conversation. The two-second pause marks the transition from one to the other. This short pause in speech flow -which, it must be noted, would never occur in face-to-face interaction -was assessed to be a buffering moment in which the speaker assembles the topics they want to discuss during a Voice Message. Moreover, this example shows how the affordances (the possibility for an action) of Voice Messaging such as the persistence of the transcript, the limitless length and the effortless use of the audio channel can lead to this unusual linguistic structure. We also remark how the intonation is tailored to the topic and to the speech act performed.
These results feed our understanding of the complex nature of how information is conveyed and managed through this type of CMC.

Other observation: Bus discomfort
In the ethnographic questionnaire a number of participants spontaneously noted their discomfort when using this technology in public, especially in public transport. I therefore named this factor "bus discomfort" (1.c in Table 2). When asked further, they attested that making a phone call in public felt less embarrassing. I have two remarks on this point.
First, participants mentioned that people "think I'm crazy" and that it "feels weird to be alone speaking to my phone" (as an answer to Question 17 in Annex 1). We can infer from this that the status of Voice Messaging as an emerging online platform and the limited familiarity of the general public with this form of CMC are at play here. Secondly, in public spaces, peripheral listeners have access to the entirety of Voice Messages which, as we have seen, are often aimed at close relatives and thus arguably contain a reasonable amount of private information. We can conclude that a general underlying form of restraint to expose what is being sent to a close relation, while using intimate intonation patterns was a determining factor when it came to producing Voice Messages in public spaces.
I therefore deemed this discomfort to be in relation to the novelty of this CMC platform as well as to the high degree of self-disclosure involved. I then argued that this form of discomfort must be taken into account when describing and discussing any asynchronous form of online spoken interaction in the future. Therefore, some additions ought to be made to Herring's classification in order to accurately describe what is at stake with WhatsApp Voice Messaging.

Limitations
This qualitative exploratory study approaches the novelty of Voice Messaging with limitations as to the methodology. The ethnographic questionnaire was specifically designed to adhere to Herring's classification, which however was not sufficiently comprehensive to integrate all parameters of the collected data. The technolinguistic biography provided interesting insights, albeit not all directly relevant to Herring's classification. This approach could be improved by a twostep study separating the questionnaire from the technolinguistic biography, also allowing for participants to think through some answers. Moreover, the CA approach was conducted solely on excerpts isolated from their broader context, thereby constraining the analysis of interaction management. The study of different populations of users (i.e. in working contexts, or different age-groups) would certainly enable the clarification of certain aspects as well as the detection of other specific characteristics.

Conclusion
WhatsApp Voice Messaging represents an emergent and rising type of Computer-Mediated Communication (CMC). This contribution describes the methodology I developed and implemented in my 'mémoire' thesis, which aimed at studying and classifying WhatsApp Voice Messaging in the existing body of linguistic literature. Some valuable ideas stemmed from previous research, in particular the possibility to employ the factors defined by Herring's classification of CMCs. This led to the development of an ethnographical and a technolinguistic questionnaire, which constituted the basis for interviews with ten WhatsApp Voice Messaging users. The transcription and analysis of this data allowed to specify the characteristics of medium and social factors and even to detect some new elements meriting attention. Furthermore, a conversation analysis was carried out on transcribed extracts of VMs, which added a complementary perspective to the previous results. A summary of the discussion and analysis for three sample results is presented. These results also highlight the advantages of assembling complementary findings from a multi-method approach, thereby characterising WhatsApp Voice Messaging in a systematic and comprehensive way.
I can conclude that this combined analysis was instrumental in our understanding of the ins and outs of this Instant Messaging online platform, and that it brings to the field a collection of compelling observations that must be taken into consideration for future research on WhatsApp