Speech sources and concept

This chapter presents some thoughts on the process of choosing speech sources for this project, and the challenges resulting from having to relate to the contextual and conceptual implications of using sources taken from real-life situations.

This has to do with the relation between the musical reality of sound and the conceptual reality of words, and describes some of the reasons for the move away from the semantic content of speech in this project, as well as the actual kinds of speech sources I ended up using.

Speech sources

Early on in this project I considered basing the performance concept on live speech as musical sources during performances – readings, poetic improvisations, acting, or perhaps other kinds of staged speech. However, I quickly became aware that I was more interested in the interplay going on in everyday, real-life conversations, without the implications of performative public or theatrical speaking styles used when standing before an audience.

As described in the chapters above, I chose to use speech genres as the main perspective to approach the abstract gestural musical content of such conversations. To explore variations in such speech genres I needed to gather recordings from a broad range of different social situations, such as intimate conversations, formal and informal dialogues, quarrels, public encounters, interrogation, negotiation, confessions etc. After doing some tentative recordings of my own, I realized that I would not get myself into a lot of such situations with a microphone and recorder very easily, and started to search for other sources. During that search, I listened to a lot of different recorded speech, and I was often struck by how sensitive perception is to subtle nuances in character. For instance, at some point I thought that actors ought to be experts at speech genres, so I turned to recordings of acting on screen, on stage, from radio drama, improvised theatre etc., but when listening to these recordings I experienced that these acted situations also constitute their own speech genres, differing slightly from what we would expect if they were for real and not acted. We expect acting to conform to these acted styles, which is part of the message that it is just fiction after all, even though it often portrays reality. Perhaps this ability to discern authenticity and sincerity from a staged or acted performance is linked to our extremely social nature and almost obsessive preoccupation with what other people are really thinking, feeling and planning, which is not always obvious from what they actually say. We are experts interpreting speech intonation and figuring out if that laughter was authentic or acted, if that excuse was really heartfelt or just courtesy.[1]

Conceptual framing: a search for the non-specific situation

The decision to use authentic conversations as sources was the reason that I had to rely on recordings of speech instead of spoken word on stage. Those recordings have to come from somewhere, and as speech can potentially be sourced from anywhere in the sphere of human activity, this somewhere can potentially bring all sorts of new content into the mix.

My first aim was to gather a wide range of speech genres that could be starting points for musical explorations. When working on this I felt the need for an overall concept regarding which recordings to use. An idea about from where and when these conversations took place that could emerge as a theme throughout the project.

That set me off in many different directions. One of the first useable sources I found was a series of reality television used for linguistic research at my university. Though from a narrow demographic selection, this material included a wide range of different speech genres both formal and informal, happy, angry, sad, personal, public, leisurely etc. The participants were recorded day and night for a long period of time, and a bit into the season they seemed to act quite natural even in this unnatural setting. This is also probably why this material have been used for linguistic research on natural speech as well. However, the somewhat ironic conceptual framing of a TV show with all its references to popular culture was not something I was looking for.

Another concept I looked into was the politics of power, searching for, and listening to, covert and private recordings of conversations by people in powerful positions. But away from the public spotlight, these people tend to sound very commonplace, very much the opposite of what we get from acted portrayals of such people on screen. Perhaps this can be viewed as a parallel to what Hannah Arendt famously described as the banality of evil, in the sense that even though people in such powerful positions can potentially affect the lives of millions of others by the very words they utter and the decisions they make, the sound of them speaking can actually be as trivial as any other everyday conversation.

When looking at possible sources I also constantly ran into the typical anthropological problem of how to observe and record authentic situations without affecting them by the very presence of a microphone. This led me to consider surveillance recordings as a possible source of authentic situations. Apart from the obvious ethical considerations I thought this approach could also become an interesting topic in itself, providing a conceptual frame for the musical explorations and also a relevant comment to the trend of more surveillance and interception of mass communications that we see in society today.

I contacted the Stasi Archive in Berlin and got permission to obtain and use actual surveillance recordings that agents at the Ministerium für Staatssicherheit recorded in East Germany during the 1980s. But when trying to use this material, it became clear that the strong connotations and the depressing context made it far too powerful to use the way I had intended. The context simply took over and made it all about living in a totalitarian surveillance state while the music became secondary. While this material was clearly very interesting, it was an entirely different project than the one I was working on.

To continue with the musical exploration of speech I instead sought to gather a wide range of less historically specific recordings. From different public sources like radio and television, anonymised linguistic databases of recorded natural speech, as well as making recordings of my own.

When working with this diverse material over time, it became evident that while for instance the reality-TV recordings were very rich in expression, what worked best in practice was actually using the least recognisable sources. Especially when I was improvising with another performer, such easily identifiable sources with a common popular cultural reference from mass media really seemed to stick out and conflict with the on-going musical discourse. I had an exchange about this with my supervisor Øyvind Brandtsegg that perhaps can shed some light on these issues:

Daniel: It seems to be a problem when bringing too much contextualized content into an otherwise non-contextualized musical situation, especially when improvising with another musician. It takes the focus away from what is happening musically.
On the other hand, I also feel a need to have a consistent selection of material constituting a conceptual framing of the project.

Øyvind: This is an Interesting issue that is somehow central to your project.

Daniel: Yes, it is an important question for the whole project. I have been contemplating this from the very start, but up until now it has been more pressing to get things to work technically and musically.

Øyvind: Are you perhaps looking for some way of staging of the music? A context that provides a direction but at the same time freedom?

Daniel: Yes, some sort of anchor point, framing or approach that clearly defines a field within which I can operate freely.

Øyvind: has this to do with the literary content, situation, mood, setting, or also that the dialogue is bound in time?

Daniel: I think it probably has most to do with the connections to the concrete reality of any recognisable persons, and the context that this imposes on the music.

Øyvind: Is this comparable to the content of lyrics in vocal music? Or the stories that are told in spoken word music? (Laurie Anderson, Golden Palominos med Nicole Blackman, etc)

Daniel: Yes, me and (fellow performer) Tone Åse talked about that. She felt the same about using written texts in an improvised setting, and for this reason would often choose the most abstract texts. I think this is more of a problem in improvised music where the music is already about the interaction of musical ideas, than in a composition that already relates to the textual content.

Øyvind: Do you need one solution for this or could you do different things for different musical results?

Daniel: No, not necessarily one final solution. Different pieces and musical settings could use different solutions.

Øyvind: What is it about ensemble improvisation that makes this problem more precarious?

Daniel: Perhaps improvised musical dialogues seeks to establish a common formal language that creates its own abstract context, and when bringing in material with an external context pointing outside the time and space of here and now, this creates a conflict of what we are dealing with.

So, what I have been looking for, I think, is something that can represent the idea of generalised social situations, without being so specific that one gets involved in each individual story. At the same time, I felt that it has to be a sort of consequence, an overall idea or concept that makes clear why I am using this or that source, but without taking over as the sole content of what the music is about.

Using the Stasi surveillance recordings introduced too much context, to the extent where it completely took over as content. To record conversations myself could have been preferable conceptually, but that would result in a very narrow range selection of languages and situations. And what would the subject be? A portrait of me and my close environment? At this point, that was not something I was interested in.

One possible solution was to mix a large number of less recognisable sources. There is still no overall theme or conceptual content other than the diversity of language and expression, but if the main subject is the generality of language and music then this is perhaps how it has to be. The diversity of human experience would be the overall concept. Another way to look at this question of conceptuality is that the focus on speech genres already is a concept that dictates what kind of material I need to use, and that additional conceptual framings would ultimately conflict with this concept of generality. This is above all a formal concept, and one that fits well with my fundamentally formalistic approach to music. This is also why I think of any outcome of this project primarily as music and not as sound art.

As these thoughts shows, it became necessary – at least for the time being – to somehow do away with the semantics of speech to be fully able to focus on the nonverbal content and the connections to improvisation and musical communication.
To reflect a bit further on this choice of not focusing on the semantic content and its implicated narratives, contexts and historically situated references: The point was to move beyond the conceptual reality of words and focus on the part of language we learn before the words – the communicative vocal gestures, and to explore interaction in conversation in relation to the kind of physical/sonic meaning-making going on in improvised music. This, I think, is on a philosophical level also related to the meaning and function of music as a social phenomenon: playing music primarily as a meaningful way of being (and thinking) together, and not just making a product for consumption – echoing the emphasis by composer Cornelius Cardew on music as a community activity, and the view that the audience (i.e. concert tickets) was a capitalist invention. This line of thought is also linked to the ideas of Roland Barthes that there are two musics – the music one listens to and the music one plays (Barthes, 1977), describing how the music one plays is not just perceived as sound but that its meaning is actively created and comprehended through the gestures of the body (Barthes calls this muscular music). This physical, and in the case of shared rituals like for instance church songs, also very social meaning of music, is what I have tried to show as a parallel to the likewise very social nature and physical togetherness that is a large part of the function of conversation.
Perhaps the approach in this project can be viewed as more of a search into the traces of musical meaning-making that can be found in speech, rather than dealing with all the other (and equally interesting) aspects of speech, such as words, poetry, semantics, voice, identity, personality, community, stories, history, society, ideas, concepts and so forth. In other words – the (physical) act of speaking together rather than the spoken word as a concept seems to be the core of what I have been trying to use as the focal point in this project.

Speech recordings

Following the thoughts above about conceptual framing and semantic content, I developed an approach where I tried to use as many different sources of recorded speech as possible, deliberately avoiding any clear contextual markers. In contrast, in some speech-based music the process of collecting and recording material can sometimes be the very starting point and raison d’être for a musical piece, making up a large part of the actual work of making the music. For the piece “Encounters in the Republic of Heaven”, Trevor Wishart reportedly spent a year just establishing contacts and recording local sources, and then an additional 18 months editing and cataloguing the recordings (Wishart, 2012, p.136). Compared to this, I did not know what kind of material I needed for this project until I was far into the process. Investigating what could work as material and exploring the particular effects of using this or that kind of source became integral parts of the research process in this project.

The reflections above give an account of some of this process, and how I arrived at the conclusion that in practice, what turned out to work best was to use a mix of many different sources of recordings: my own recordings, shared community recordings, clips from reality-TV, many different linguistic corpora, surveillance recordings, court recordings, telephone recordings, radio broadcasts and documentaries. These are only some of the kinds of sources I have explored:

Santa Barbara Corpus (linguistic corpus of natural speech)
Big Brother television series (TV Norge 2001)
CallFriend and CallHome (linguistic corpora of telephone conversations)
archive.org (community recordings, field recordings, religious sermons)
talkbank.org (collection of many different linguistic speech corpora)
BBC podcast – “the listening project” (recorded community dialogues)
News broadcasts
ATCOSIM (Air Traffic Controllers simulation speech corpus)
Emotive speech corpora (Emovo, EmoDB, Ryersons)
Supreme Court of the US archive
Najonalbiblioteket (National Library of Norway – historical recordings)
Stasi surveillance recordings
Watergate covert recordings from the Nixon Library
Emergency telephone recordings (linguistic corpora of emergency calls)

Some of the most interesting sources to use were telephone conversations. A phone call usually has a clear closed form with a definitive beginning, middle and end, reminiscent of (western) narrative musical forms. It is also the main reference for interacting by means of the voice alone. As the voice through the receiver is the sole medium of communication, and no body language or other visual cues are available, the prosody, intonation and speech genre become very important aspects of the interpretation. This is also why I chose to use a real telephone for the sound installation in the final presentation of artistic results. Perhaps this particular mode of communication has the potential to be made into an overall conceptual framing in future work.

  1. As a side note to my decision to focus on real conversations as material, it must be said that I later also found it interesting to use recordings of highly emotive speech by actors making dramatized stereotypes of different emotions. While not sounding convincing at all, they nevertheless had a kind of almost abstract poetic quality. Perhaps in the way they sounded stylized, they actually came closer to poetry and music.


Barthes, R. (1977). Musica Practica. In Image Music Text: Essays selected and translated by Stephen Heath. London: Fontana Press.

Wishart, T. (2012). Sound Composition. Orpheus the Pantomime.

← Previous page:  Possibilities and limitations Next page: Perception of speech and music