Presented at Generative Art 3, Milan, Italy

Listening to Life: Collaborations with Audio Organisms

Aaron Wolf Baum, Ph.D.

EternalNovelty.com

email: drfriendly@earthlink.net

 

 

1. Introduction

 

What is it about a flower, a forest, a culture, or a great symphony that makes us feel that it is, in some sense, "alive"? They share the properties of extensive and intricate internal correlations, responsivity to their environment, and the ability to surprise us continually — it is the combination of these traits that makes us recognize life. Life is what creates beauty. Its behavior and forms often seem to be in complete contrast to those of information technology; in spite of the fact that this technology has reached a level of microscopic structure that rivals that of living things, we experience it as cold and lifeless. As this technology extends its influence into our lives, this issue becomes critically important to our future: can we breathe life into our technology? In spite of their highly varied form, all systems that are experienced as being "alive" have a shared underlying structure: large numbers of nodes (neurons, genes, species, organs, catalytic molecules, etc.) interconnected in a network without central guidance. It is through the connections of the network that the systems self-organize, naturally creating correlations extending from the broadest to the most fine-grained aspects of the system. By creating such relationships inside a computer, it should be possible to recreate the emergence of life.

 

 

2. Sample Networks

 

Inspired by this possibility, I first started working on living sound systems in 1998. My first systems, implemented in MATLAB and running out of real time, treated the individual samples of a sound as the nodes in a network, then iterated the network a certain number of times. Treating the parameters of the network connections and the iterations as a "genome" , I could evolve sounds that were quite interesting. However, this approach seemed to be flawed, in that the nodes of the network were determined by the way in which the computer represented sound — as a series of samples corresponding to instantaneous pressure values — as opposed to the way that the human hearing system represents sounds as events in a space of both frequency and time. Since my objective was to create an intuitive "feel" for the emergence of complexity, creating a system in which the nodes corresponded more closely with the nodes of the human hearing system seemed to be an attractive approach. One way to represent sound that captures both frequency and time information is wavelet analysis; this has an attraction over windowed Fourier analysis in that the time resolution of the analysis is proportional to the period of the frequencies being analyzed, which corresponds more closely with real sound and because it does not require windowing it does not force the sound into an artificial periodicity.

 

 

3. Wavelet Networks

 

To explore this approach I wrote C code on a Linux box that continuously performs wavelet analysis on incoming sound. These coefficients are used as the input into a network, which synthesizes a new set of wavelet coefficients which are linear combinations of the input coefficients, with weights determined by the topology of the network. These coefficients are subjected to a continuous transfer function and then resynthesized into a waveform. This waveform is normalized (to keep it within the dynamic range of the 16-bit sound system) and feed to the output, which is monitored and routed back to the input. Subsequent output waveforms are automatically crossfaded to ensure a continuous soundscape. The ecosystem of sound thus created evolves rapidly, creating a far wider range of output than previous systems. For sufficiently complex netowork topologies, the output does not seem to settle into any discernable pattern, although ephemeral patterns are frequently heard. I have tried many network topologies, and I have found that within a broad range of parameters both people and animals seem to find its output quite interesting. The size of the network of wavelet coefficents (up to several million connections between as many as 2 million coefficients) contributed to its ability to generate continually novel output for long periods of time, running up to three weeks at a time in gallery settings and never falling into any recognizable long-term patterns; however, its output is far from random.

 

The falling away and re-emergence of patterns suggests that some sounds can recreate themselves after a number of cycles of the network; these, then, could be audio proto-organisms reminiscent of Kauffman’s autocatalytic sets, although a careful study has not been carried out. The listener’s sense of patterns in the soundscape grows with increased listening; however, many of the patterns present may be beyond what can be grasped rationally, and this may be the key to the interest of the listeners. It is hard to deny a sense of "something being there", and the level of detail and overall structure, combined with the high quality of the audio, create an intuitive, even sensual encounter with an alien ecosystem.

However, in this system the role of the creator is merely to set the "laws of physics" in this little universe and then allow it to self-organize; the result is something that is fundamentally inhuman, related only to this universe by the mind that set up the original parameters. I began to conceive of a system in which a human being could be part of the feedback loop, varying the direction of the evolution, collaborating with the audio ecosystem instead of merely observing its creativity. As I attempted to create such a system, I found that the wavelet approach had some limitations in realizing a tight feedback loop between the player and the instrument-ecosystem. While the parameters of the wavelet network can be altered during operation, the "chunkiness" of the processing delays the perception of any resulting changes. Furthermore, due to the sheer complexity of the wavelet network, the changes in the sound caused by any given change of parameters is extremely difficult to predict, takes a long time to unfold, and may be difficult to distinguish from the "natural" evolution of the sound. For these reasons, I decided to try a different approach emphasizing fast response to parameter changes, while trying to preserve the organic nature of the sound. At the same time, I decided to investigate new sorts of network nodes that could be matched even more closely with the "nodes" of human hearing.

 

 

4. Filter Networks and Live Performance

 

I began to work with the sorts of filters used in electronic synthesis and obtained interesting results using arrays of networked bandpass filters in feedback loops. The architecture of a typical filter network is shown in figure 2. The input signal is fed to a number of filter nodes, each of which consists of a bandpass filter, an amplitude follower, and a variable gain. The bandpass filter frequencies are set to the desired frequencies of operation; in the current generation, the frequencies are kept static and are set so as to cover the entire audible spectrum. The bandwidths determine in part the sound of the individual filters and their interplay in the network; varying them produces interesting effects on the sound. The output of each filter is fed into an amplitude follower and a variable gain controlled by a linear sum of the outputs from the amplitude followers in the network. The time constants determining the response of the amplitude follower determine in part the rate at which information propagates through the network, which plays an important role in determining the overall character of the audio ecosystem produced. This circuit has been effected in MAX/MSP and on Nord Modular synthesizers with up to 64 filter nodes; more nodes tend to create a higher level of detail in the sound. Other audio modification, including delay, compression, and additional filtering, are generally used in the feedback loop along with the filter network to add "color" and interesting structure to the sound.

 

The great advantage of the filter network circuit is that, like a more conventional feedback loop involving conventional audio effects and/or speakers and microphones, it can allow the rapid evolution of the sound in response to changes in the processing parameters. This creates the possibility of live performance through the control of these parameters; the fast response of the system makes possible a tight feedback loop between the performer(s) and the audio ecosystem. Unlike a conventional feedback loop, a filter network can be configured to eliminate the other unpleasant effects of runaway feedback ("screeching"). Network connections can be set so that excessive output from one filter node sends "ripples" through the entire system that reduce the gain at that node. This behavior echoes the indirect control mechanisms of real ecosystems. In performance, this automatic avoidance of the most unpleasant sounds helps the performer to explore the many possibilities of the system with confidence that the results will have rich and varied texture.

The original interface for this system consisted of a large number of knobs, each controlling a parameter of the system through MIDI control signals. This interface allowed the exploration of the potentiality of performances with audio ecosystems, but as a musical instrument it was limited in a number of important ways. The number of parameters that could be controlled simultaneously was limited, and the human mind is not designed to keep track of the positions and movements of large numbers of knobs. In an effort to create a true "instrument" based on filter networks, an interface for this system using motion capture gloves has been implemented. These gloves (5th Gloves from 5th Dimension Technologies) use fiber optics to sense the bend of each individual finger and two-axis tilt sensors to sense the pitch and roll of each hand, allowing fourteen system parameters to be controlled simultaneously in a highly intuitive manner. The bend and tilt data are sampled with 8-bit accuracy approximately 200 times a second, allowing subtle, low-latency control of the sonic environment. The extensive brain power devoted to sensing and remembering hand positions and gestures makes this interface far more powerful, and the visible motions of the performer’s hands becomes a point of connection with the audience as well as the system.

In spite of the simplistic nature of the interconnections in these systems compared to those of actual living systems, the surprising beauty of their sound speaks of a whole new world of possible art that is just opening up to exploration. The ever-accelerating progress of technology suggests that the potential of this world is widening every day.

 

Ongoing research in this area includes the identification of parameters best suited to live control via the glove interface; creating a companion live-controlled video ecosystem based on these principles; investigating new network topologies and mappings between nodes and generated content; creating audio ecosystems that interact and collaborate with each other and with human musicians; and many other fascinating ideas.