Perception of speech and music

One of the key questions articulated in this project is how the use of speech gestures as material in improvised music affects the perception of both music and speech. In this chapter I will try to present some of my observations and reflections in this regard, relating to the different modes of perceiving speech and music, how different sound sources influence the frame of perception, and different perceptions of time in both speech and music.

Semantic and aesthetic modes of perception

The most fundamental difference between speech and music seems to me to be between perceiving speech as semantic content, and music as aesthetic form.

In his famous 1917 essay Art as Technique, the Russian writer and formalist literary critic Viktor Shklovsky describes how perception of everyday phenomena is usually highly automated through familiarisation, to the point where they are not actually seen any more but only recognised as symbols. Through the technique of de-familiarisation, Shklovsky notes, art can turn attention back to the sensation of their features and forms as they are experienced for the first time (Shklovsky, 1965). Similarly, from a cognitive perspective, Per Aage Brandt proposes that the mode of perception is determined by the context – if the experience is strongly framed or not (Brandt, 2006). In the trivial and unbounded stream of everyday experience, perception is pragmatic and oriented towards content and action. In a strongly framed aesthetic context, the mode of perception is extraordinary, an intense mode of form-oriented hyper-perception. For Brandt, the concept of form in art is key to this framing of reality that enables aesthetic perception. This means that everyday speech, by its very everydayness, by default is perceived through this pragmatic, content oriented mode of perception. Even when formal operations like abstraction and fragmentation introduce an aesthetic framing, there still seems to me to be a cognitive split between still recognizing and focusing on the everyday speech sources through this pragmatic mode and perceiving music as abstract aesthetic forms. In my experience, it is very hard to focus on both the semantic and musical aspects of ordinary speech at the same time. Instead, perception seems to flip between the two modes, with one always being in the foreground depending on if I know the language, how intelligible the words are, and if there is a clear narrative or story that draws attention.

This is not necessarily just about aesthetic framing. Perhaps it also has to do with the way music and language seems to be processed in different parts of the brain. The fascinating split brain research by Roger Sperry and Michael Gazzaniga, carried out on severe epilepsy patients that as a last resort had the connections between their brain halves cut, has provided some interesting insights into how processing in the brain is divided (Gazzaniga, 1967). According to this research, the left hemisphere seems to be in charge of intellectual processes like storytelling and creating narratives and semantic meaning, while the right brain hemisphere deals with sensations, and spatial, emotional and musical cognition. One way to look at this, is that we are equipped with two parallel perceptual systems in effect creating two parallel experiences competing for attention, and that might explain the difficulty of retaining focus on the musical forms of speech when the spoken narrative grabs our attention.

Sound sources as framing

This tendency for semantic content to capture the main perceptive focus is the background for my whole methodology of abstraction in this project, reflected in the reformulated aims to explore the musical and communicative potential of speech primarily as vocal prosodic gestures. In addition to the abstraction methods offered by signal processing, this also extends to the actual sound production setup of different physical sound sources. Contrary to Schaeffer’s ideas of an acousmatic “reduced” listening, I believe that the sources and inferred causes of sounds will always play some part of our perception of sound. So rather than to supress this tendency, the active play with sound sources is embraced as a central feature in this project. The use of a whole range of different sounding loudspeakers and resonating bodies creates a layer of formal differentiation that shifts the attention to their sonic characteristics and the sources as entities in the room. The use of these hybrid acoustic instrument-loudspeakers can render speech perfectly intelligible while at the same time colouring the sound just enough to create an aesthetic framing. This framing also creates a contrast to the conventional loudspeaker sources and enhances the perception of the qualities and connotations of them as well. The sound of an acoustic instrument somehow means music, and creates a frame of reference that invites musical listening, while speech mediated through a low-fidelity radio have connotations to broadcast and public address. In contrast to these physical objects, the invisible sound wall produced by stereo loudspeakers creates imagined, virtual sonic spaces. New and interesting perspectives for listening appear when these sonic realms start to blend in an orchestration of different sound qualities and physicalities.

Time perception

Another interesting aspect of perception contemplated in this project is different experiences of time. One common kind of time perception relates to the narrative, creating an expectation of a linear story to be told, unfolded from start to end in a forward motion through time. This is very different from the kind of time experienced in the moment of unplanned, responsive interplay, like in a spoken conversation or in an improvised musical exchange. Even with a shared experience of what has happened so far and what might happen next, the present is somehow accentuated by the very fact that any response is unknowable before it happens, reflecting the direct etymological meaning of the improvised as something unforeseen. The dynamics and the risks and rewards of that moment can result in a feeling of an expanded present, a continual now. Indeed, according to merited improviser Derek Baily, improvisation can even be considered a celebration of the moment (Bailey, 2004). This is an experience of time that is qualitatively different from the linear narrative, and one that is fundamental in improvised music.

In addition to these ideas, some additional concepts have been adopted for thinking about time in this project: the standstill of a static soundscape or a sparse background, and the circular time experienced with fragmentation into cyclic repetition. Together, these four perspectives on time perception have emerged in this project as a result of – and played a part in – the exploration of differences and similarities in speech and music:

Forward linear motion
Narratives are natural in both speech and music, as a story with an implicit dramaturgical development. That can be a monologue, the presentation of documented dialogues (where the act of presenting this dialogue becomes a narrative) or any kind of public address like the very situation of a musical performance where conventions create an expectation that performers will walk on stage and produce sound intentionally for an appropriate period of time before the performance will end and the audience will applaud.

Attentive present
The alertness of the moment is also a feature of speech and music alike, appearing in every unexpected response and unforeseen twist in everyday conversation and improvised music, perhaps best exemplified with the concept of question?

Cyclic time
Repetition on the other hand, is a phenomenon that is more related to the physical experience of body movements, of dance or manual labour, or the repetitive or cyclical structure of movement in the physical world in general (raindrops, heartbeats, respiration, walking, machinery, weather, seasons, planets). Repetition for its own sake is not often encountered in everyday speech except to emphasize something, but is very much present in the aesthetic domains of music, poetry and ritual. Indeed, repetition is a defining feature of so much music from all cultures around the world that it is stressed by some cognitive researchers as “a fundamental characteristic of what we experience as music” (Margulis, 2014, p. 5). The use of cyclic structures with speech will quickly draw attention to its formal structure in what is known as semantic satiation, framing it within an aesthetic mode of perception and shift the focus away from the semantic content, and for most purposes in everyday spoken communication this defies the very purpose of communicating. But for the purpose of making music, allowing varying degrees of repetition with the speech material makes it possible to play with this clear marker between language and music and direct the mode of perception towards the one or the other.

Static time
Stillness complements these modes of time perceptions as the opposite of intentional communicative gestures, as the indifferent (but perhaps expectant) background without the sound or movement of any acting agents. It offers other perspectives on sound as a phenomenon in space, which was one of the reasons for making a sound installation version of my performance concept – a sound installation without performers or linear narrative, but with the possibility to walk around and structure the experience spatially.

These are some of the ways the use of recorded speech in improvised musical settings has highlighted the shifting modes of perception of both speech and music in this project. That has also played a part in developing thoughts about form in this project, something that will be presented in the next chapter.


Bailey, D. (2004). Free Improvisation. In D. Warner & C. Cox (Eds.), Audio Culture: Readings in Modern Music (pp. 255–265). New York: Continuum.

Brandt, P. A. (2006). Form and Meaning in Art. In M. Turner (Ed.), The Artful Mind (pp. 171–186). Oxford: Oxford University Press.

Gazzaniga, M. S. (1967). The Split Brain in Man. Scientific American, 217(2), 24–29.

Margulis, E. H. (2014). On repeat: How music plays the mind. Oxford University Press.

Shklovsky, V. (1965). Art as Technique. In L. T. Lemon & M. J. Reis (Trans.), Russian Formalist Criticism: Four Essays (pp. 3–24). Lincoln: University of Nebraska Press.

← Previous page: Speech sources and concept Next page: Form, meaning and language