Speech gestures as material

What exactly constitutes the material when using speech as a source for music? Since speech includes language and language conveys ideas, it could from a conceptual point of view be almost anything in the sphere of human experience that could be spoken of or inferred from speech – the historical context, the site, the identities, any ideas or topics of conversation, the narrative, the poetic qualities of words, the voice as instrument, or as a metaphor (voice concern, give voice to, vote), and so forth. Speech is of course first of all experienced physically as sound. Above all, highly structured sound, a feature it shares with music. Sound is vibrations – movement – and to produce sound one must have physical movement of some kind. Following this simple fact one could even say that “musical experience is inseparable from the sensation of movement” (Godøy & Leman, 2010, p. 3). This is a particularly relevant observation for the experience of improvised music, where any action and sound contributed by an improvising musician becomes a gesture – a potentially significant act, an utterance that not only is subject to active interpretation but depending on this also can change and actively shape the further development of the musical discourse. This function of improvised musical ideas as gestures is also something Njål Ølnes has emphasized in his thorough investigation into how musical signs or gestures are used to establish and develop musical ideas through dialogical processes into larger musical forms or gestalts (Ølnes, 2016).

In this regard, it is reasonable to consider the musical gesture as the basic musical unit in the improvised exchange of sonic ideas. In other contexts, the concept of gesture may have other meanings and can refer to a metaphor or just describe an action, but it is in the sense of a communicative act – a movement to express meaning – it is used here.

My wish to explore speech from the perspective of instrumental improvised music meant that such musical gestures – wordless musical utterances in improvised interplay, formed the musical background and approach of this endeavour. That resulted in a decision to primarily focus on the abstract prosodic qualities of speech, and not on the words and semantic content or other aspects and concepts associated with language, voice, identity, site, story, etc.

This is the main approach to speech I intuitively have adopted throughout this project, viewing spoken utterances primarily as musical gestures: non-conceptual, but with potential musical meaning. Conversation not as a discussion of ideas and concepts, but of non-verbal actions and interactions, parallel to how musical gestures make up the wordless discourse of improvised music.

This approach led me first to look into the linguistic fields of prosody and conversation analysis, to learn what functions and significance this musical foundation of prosody might carry in language seen from a linguistic perspective.

Prosodic phenomena

In “The music of everyday speech”, the linguist Ann Wennerstrom gives a thorough account of how prosodic features are actively used to structure utterances and convey information in conversations (Wennerstrom, 2001). Some examples of such features include how strong accents are typically used to highlight the most important words, while high-pitched syllables are used to mark new information. On the other hand, the modulation to a higher “key” is often used to signify a change of subject, (and similarly, a lower mean pitch is used to signal supplementary comments, as if in parentheses).

Though linguists typically operate with phonemes as the lowest level of segmentation, the syllable is regarded the basic unit of rhythm. A syllable is usually based around a voiced vowel, and having a pitch it can be viewed as corresponding to the concept of a note as a musical unit. The pitches of subsequent syllable-notes form melodic contours, but their particular timing also results in a particular speech-rate that can be related to the musical concept of tempo. One interesting rhythmic phenomenon in this regard is how a shared semi-regular pulse or tempo is usually adapted by speakers. The adjustment to a shared pulse also extends across turns in a conversation, with speakers often timing their responses to coincide with the pulse implied by the former speaker. How syllables express this pulse can be quite different though, and languages are generally classified as belonging to either of two categories of timing: In stress-timed languages (e.g. Germanic languages like English, German and Norwegian) the stressed syllables are placed at regular intervals approaching an even pulse while the unstressed syllables in between are sped up or slowed down in order to match this pulse. In syllable-timed languages (e.g. Roman languages like French and Spanish), all syllables are timed more or less according to the underlying pulse. Interestingly enough, this timing difference has even been demonstrated in music in a study on rhythmical differences between English and French classical music (Patel & Daniele, 2003).

These and other interesting prosodic phenomena have provided the background for identifying significant features in spoken language that also could be interesting to use as a foundation for exploring speech musically. That includes the choice of the syllable as basic rhythmic and melodic unit, the use of both stressed and high-pitched accents for creating derivative rhythmic structures, and the ability to play with the inferred tempo by gradual rhythmical quantization to a grid derived from the underlying pulse. This prosodic background has consequently influenced design choices and is directly reflected in the particular functions of the software instrument system used in these explorations, as described in the chapter below detailing the instrument system. It has thus formed the foundation for the musical explorations undertaken during performance.

Speech genres

A useful way to think about the prosodic traits of speech and their wider possible musical meanings and implications has been provided in this project by the concept of speech genres. Other perspectives could have been chosen.

For instance, one theme often raised when discussing the musical character of speech melodies is the obvious differences in intonation between local dialects, or between typical cultural stereotypes associated with different spoken languages and cultures. Linguists on the other hand, will perhaps note how the different phonetic structures of languages make them sound completely different on the articulatory level. In addition, one can observe a wide range of wildly different speaking styles caused by all kinds of personal idiosyncrasies and physical conditions relating to age, gender, health, and other individual traits in general, like stuttering or a hoarse voice etc.

However, when listening to recorded speech from many different languages and settings, I have been struck by how similar different languages and people actually sound in comparable situations. The word situation is a clue here, as the style or genre used in a given situation conveys the social context and purpose of communication. It would probably sound strange in any language or culture to speak in a very formal tone to an infant (with the obvious exception of ritual situations like baptism). One explanation for this similarity can perhaps be found if one considers the function of spoken utterances the same way as musical gestures in improvised music – as physical gestures in a social situation, and following that – how vocal gestures can overlap and extend the kind of meaning conveyed by body language and physical interaction. “Sound is touch at a distance”, the psychologist Anne Fernald noted, observing how parents of different cultures all started to talk to their babies in a comforting tone after putting them down, to kind of keep staying in touch with them (Radiolab, 2006).

I think there is something fundamental about how vocal utterances can be perceived this way as touch – as physical sensations comprehended through the wider cognitive apparatus, which includes emotions. Sound is after all physical vibration, and the physiological foundation for this kind of sensation-based cognition means that it extends across cultures and languages as something much more universal than cultural and stylistic codes, and even across species, as domesticated animals like dogs seem to have few problems interpreting intentions from speech gestures.

Returning to the concept of speech genres, these expressive styles can be viewed as formalized expressions of both such gestural sensations and social conventions. For musical purposes, this is interesting as it points to a deeper level on which music also might function as a kind of social, universal language. This becomes even more interesting when the focus is on musical improvisation from the perspective of social interaction. Based on these ideas, the motivation to use speech genres as the main perspective from which to explore possible social and musical meanings of prosodic gestures has first of all influenced what kind of speech I have used as subjects of study, but it has also defined the methods I have used for generating musical structures from this material. These methods are described in the following chapter.


Godøy, R. I., & Leman, M. (Eds.). (2010). Musical Gestures: Sound, Movement, and Meaning. New York: Routledge. https://doi.org/10.4324/9780203863411

Patel, A. D., & Daniele, J. R. (2003). An empirical comparison of rhythm in language and music. Cognition, 87(1), pp.35-45.

Radiolab. (2006). Musical Language [Audio Podcast]. New York: WNYC Radio. Retrieved from https://www.radiolab.org/story/91514-sound-as-touch/

Wennerstrom, A. (2001). The Music of Everyday Speech: Prosody and Discourse analysis. Oxford University Press.

Ølnes, N. (2016). From Small Signs to Great Form – Analysis of the musical interplay in free improvisation, using the tools of Aural Sonology. Norwegian Academy of Music, Oslo. Retrieved from https://brage.bibsys.no/xmlui/handle/11250/2381966

← Previous page: Work and results Next page: Methods of abstraction