The EMOSynth
An emotion-driven music generator
This paper gives an overview of the EMO-Synth project, a project focusing on biofeedback, artificial intelligence, affective computing, advanced statistical modelling and automatic music and image generation. In the first part, a brief description of the system and its workings is given. The second part deals with the crucial approach towards artistic creativity that lies at the base of the EMO-Synth project. In the final part, the EMO-Synth project is described within a broader context of art and science. The EMO-Synth project was realised with the support of the Flemish Audiovisual Fund, Flanders Image and the Centre for Digital Cultures and Technology.
The EMO-Synth
The EMO-Synth is a new interactive multimedia system capable of automatically generating and manipulating sound and image to bring the user to certain predefined emotional states. During performances, the emotional responses of the user are measured using biosensors (the EEG Trainer [Fig. 2] and biotrace software by Mind Media B.V.) that register certain psycho-physiological parameters such as heart rate (electrocardiogram, or ECG), stress level (galvanic skin response, or GSR) and electromyogram (EMG).
Using the EMO-Synth involves two stages: a learning phase and a performance phase. In the learning phase the EMO-Synth will subsequently generate auditory artefacts and analyze the resulting emotional impact on the user. Using machine learning techniques and statistical modelling the EMO-Synth then learns in an adaptive way to generate sounds and music that bring the user to certain pre-defined emotional states. In this way the user has his or her emotional feedback matched to sound using artificial intelligence models constructed by the EMO-Synth. Once the learning phase has passed, the EMO-Synth is ready to be used as a real-time responsive multimedia tool. This is the performance phase. During these performances the models which were constructed in the learning phase are used to produce real-time personalized soundtracks for live visuals. The soundtracks involve digitized sounds and live musicians, while the visual material is generated by the EMO-Synth and partially controlled by the user, who is positioned in front of an audience (Fig. 3; Video 1). The live audiovisual concerts resulting from this experience intend to be unique and entirely based on the personal emotional feedback of the user. During every performance, the EMO-Synth will seek to maximize the emotional impact of the generated sound and image on the user.
The development of the EMO-Synth relies on a broad range of scientific and artistic disciplines, including affective computing, artificial intelligence techniques such as genetic programming (Goldberg 1989, Koza 1992, 1994), advanced statistical modelling, and algorithmic sound and image generation.
The idea of using biofeedback in performance arts is of course not new. Amongst the first to use biofeedback we can mention Alvin Lucier with his ground-breaking Music for Solo Performer (1965) and the pioneering work of David Rosenboom, author of the essential Biofeedback and the Arts: Results of early experiments (cf. Rosenboom 1974). Throughout the years, biofeedback has been integrated in a whole range of artistic contexts and projects and its use has been extensively been described in numerous references (e.g. Arslan et al., 2005 and 2006, Filatriau and Kessous 2008, Knapp and Lusted 1990, Nagashima 2003). But whereas a lot of artistic projects involving biofeedback — e.g. Sensorband (1993–2003), Biomuse Trio (2008), InsideOUT (2009) and The Heart Chamber Orchestra (2010) — are more focused on the domain of sonification of biometric data, 1[1. See “The Biomuse Trio in Conversation: An Interview with R. Benjamin Knapp and Eric Lyon” by Gascia Ouzounian and “The Heart Chamber Orchestra: An Audio-visual real-time performance for chamber orchestra based on heartbeats” by Peter Votava and Erich Berger in this issue of eContact!] the EMO-Synth project provides a new dimension. Music and sound generated by the EMO-Synth are the result of far more than pure sonification. The system really tries to understand emotional responses to sound or music and uses this information while building appropriate artificial intelligence models. And these artificial intelligence models are at the heart of the music-generating algorithm of the EMO-Synth. In order to allow maximum flexibility music generation also incorporates different conventional sources: MIDI-based data streams, sample-based audio streams and even live musicians by the means of virtually generated scores. Due to its unique implementation, music and sound that is generated by the system can be very diverse: from tonal jazz or pop music over more textual soundtracks to real experimental avant-garde. In this way, the diversity of personal musical tastes of the individual user is embedded as much as possible in the resultant performance.
It is important however to note here that one cannot discuss musical taste, meaning or emotional impact separated from cultural context. As an example we could consider Balinese Anklung music. To many of those who grew up familiar with Western culture, this type of music might sound pleasing. But if we situate this musical form within its true cultural context, Anklung music has very specific connotations and meaning to the Balinese listener. As it is most often used at funerals and accompanies death rituals, it evokes strong emotions of sadness and sacred sweetness to those listeners.
While developing the EMO-Synth, we were very well aware of this cultural and even personal aspect of emotional impact of musical stimuli. Therefore a system was developed such that music and sound generation by the EMO-Synth involves, if desired, the use of customized sounds organized in embedded databases. Using the database system enables the user to incorporate his or her own musical identity. This source of music generation can moreover be combined with an algorithmic harmonic and tonal music generation engine. At present this engine is based on the Western musical idiom. But a desirable extension of the future prototypes of the EMO-Synth would be to implement other harmonic and tonal systems.
The first prototype of the EMO-Synth goes back to 2004. At that time developing this prototype was more or less intended to be a proof of concept to explore possibilities and design future directions of the project. Between 2004 and 2007 the first three prototypes of the system were developed. To elaborate the various aspects of the growing project, in 2007 the dynamic collective called Office Tamuraj was founded. In this way the expertise of various people with different backgrounds could be combined. This led to the realisation of the fourth prototype in 2008. These first four prototypes of the EMO-Synth did not involve any visual output yet. Between 2008 and 2010 the development of the fifth prototype of the EMO-Synth in the interactive multimedia project Montage Cinema Revived by the EMO-Synth was supported by the Flemish authorities via the Flemish Audiovisual Fund.
In Montage Cinema Revived by the EMO-Synth, the EMO-Synth was used for the first time in a cinematographic setting. During performances of the project the EMO-Synth automatically generates personalized soundtracks with maximal emotional impact for re-edited versions of any given movie. Video material which was used during performances of Montage Cinema Revived by the EMO-Synth includes the classic The Phantom of the Opera by Rupert Julian (1925) as well as dedicated material commissioned by various video artists. The most current prototype of the EMO-Synth is Prototype 5.2, which incorporates several improvements over the initial Prototype 5.0 developed between 2008 and 2010.
The EMO-Synth, Biofeedback and Artistic Creativity
The development of the EMO-Synth is situated in the promising domain of affective computing founded by Rosalind W. Picard at MIT (Picard 1995 and 1997). In this fascinating domain various researchers are trying to create emotional man-machine interactions. To this end, use is being made of the technique of biofeedback to monitor the emotional state. The use of affective computing in the EMO-Synth project is also crucial to incorporate the concept of artistic creativity.
Working both in the field of mathematics and music it is my personal point of view that at the core of every artwork lies its ability to communicate through human emotions. This also pertains to the artistic action with the intention of expressing no emotion; a lack of emotion can also be considered as an emotional state. It is my experience that the dual juxtaposition between emotion and no emotion is crucial in any artistic creative process. Moreover, it is seminal to artistic meaning and contextuality. This principle of juxtaposition as a driving force also lies at the heart of Zen Buddhism (Humphreys 1970; Suzuki 1970), which was and still is one of the major inspirations for the development of the EMO-Synth project. The study of Zen Buddhism can in various ways help one to discover interesting and new manners of connecting Art and Science intuitively (Pirsig 1974). Too often these two fields are placed in juxtaposition. But when one fully understands the non-duality principle of Zen 2[2. The non-duality principle essentially states that apparent dualities in the world we live in, such as cold-warm, luck-misfortune, happy-depressed, etc. are non-existing. They are just manifestations of an unobserved underlying state of mind. Eliminating the thought and sensation of duality is one of the main subjects in Zen Buddhism.], it becomes rapidly clear that Art and Science are in fact manifestations of one and the same mental activity or state. And it is this state or activity that I believe is closely related to the core of what can be considered to be artistic creativity. With the EMO-Synth project, new ways are sought to integrate this way of thinking into a concrete multimedia project that stresses both the rational and irrational part of human behaviour.
Within this spirit, as the EMO-Synth progressed, I always kept in mind the restraints that every programmable discrete and finite machine such as a computer has. As human thought and behaviour seem to incorporate continuous infinite and chaotic processes (Strogatz 2000), the main goal of the EMO-Synth project is not to replace human artistic creativity but to enhance or extend it. The important historical fact of the failure of the cognitive paradigm (De Mey 1992) in artificial intelligence (AI) serves as a milestone. At the core, this cognitive paradigm deals with the recreation of human intelligence by an artificial system such as a computer or any digital system. During its long development this paradigm has caused the AI community to split into two groups. On the one hand, there is the Strong AI group (Goertzel and Wang 2007) who believe that creating an artificially intelligent system is just a matter of computational power and algorithm improvement. On the other hand, there are the Weak AI (Searle 1980) adherents who basically reject the fact that any autonomous artificial system can incorporate true intelligence. During the last decades it became more and more clear that recreating an artificial brain might be not for the immediate future. One result of this realization is that a completely new field in Artificial Intelligence has arisen, namely, as mentioned above, Affective Computing (Picard 1995, 1997). The central question in this field is to study and develop tools to be used for emotional man-machine interaction. In Affective Computing, researchers are not looking to replace human presence or emotion but to extend it or realize meaningful communication between man and machine. As the starting point for the EMO-Synth project is human emotion as a key to connect both the rational and irrational human thought, affective computing is crucial for its development.
With the previous discussion on artistic creativity in mind, two basic principles for the EMO-Synth project can be formulated:
- The EMO-Synth can be seen as a new virtual tool by which any audio-visual artist can extend and complement his or her own creative process. The EMO-Synth could moreover become an inspirational source. By using the system it becomes possible for the user to understand the creative process on a whole new level.
- By using the EMO-Synth a totally new virtual platform arises in which boundaries between artist, audience and the generated artefacts are redefined. To what level are these artefacts a creation of the artist or the audience? To which extent can we speak of artistic input when the EMO-Synth participates in the creative process? And a very interesting question which bears much relevance for today’s context of digital media: who is the author of that which is being generated?
From its numerous applications it is clear that biofeedback is essential in capturing the emotional state of the subject. Therefore this discussion of artistic creativity leads to the conclusion that biofeedback techniques play a key role in integrating the artistic creative process into the EMO-Synth both on a conceptual as well as a concrete level.
The EMO-Synth: Combining Art and Science
When speaking about or discussing the EMO-Synth project it could be understood both as an art and as a science project. Reasons to label it as an artistic project rely on the fact that it is not purely presented in a scientific setting or because there is no formal data collection and analysis presented to the audience. But on the other hand, there are also reasons why the EMO-Synth project could be labelled as a science project. As the audio-visual material is to an extent generated algorithmically by the EMO-Synth the question of who the artist is arises during the performances. As a consequence, the audience might wonder if there is any artistic process involved in the project at all. So in this context, the project is seen more as a scientific product development case, where the EMO-Synth might one day become a commercial product.
Developing the EMO-Synth has also served as an inspirational source to develop new ways to look for possible synergies between Art and Science.
The need to define the EMO-Synth project purely as either an art or a science project strongly implies a general world view in which the arts and sciences are two separate domains based on different rules and mechanisms. But as already stated, crucial to my work on the EMO-Synth is my firm belief that Science and Art are essentially expressions of one and the same inner creative personal world. 3[3. See also (Popper 2007) about the immersion between art and technology and (Bateson 2007) about a holistic and more intuitive view of science.] It was during my work in the domains of theoretical mathematics, statistics and artificial intelligence, and, in parallel, in multimedia art and music, that I experienced on a number of occasions the striking similarities between both creative processes. This helped me to clarify the striking similarity between mathematical research and the artistic process involved in e.g. composing, writing or producing a piece of music. Whereas mathematics is mostly thought of as being all about rationality, the contrary is even more true. Designing mathematical proofs for example is for 80% about some non-rational mathematical intuition and 20% about having the right mathematical tools and techniques to complement the intuition. Learning to become a true mathematician is a process very similar to learning to become a skilled jazz musician. Essentially it boils down to incorporating some crucial intuition and knowledge to become part of your being and even your body, so to speak. Of course one could argue that becoming a skilled jazz musician requires years of hard study and training and that intuition cannot be learned. But it is my point of view that through the learning process to become a skilled jazz musician, one seeks for a perfect integration and balance between intuition and skilled knowledge in oneself. In a truly skilled artist, intuition and skills are two faces of the same coin that cannot exist without each other. You could say that you have to try to get the mathematics “in your veins” to let it “become part of yourself.” Unfortunately, the separation between rational and irrational knowledge is becoming more and more neglected in our growing technocratic society ruled by so-called statistical models which are believed to have absolute predictive power.
Using the ideas described above, I would conclude this section by stating that the EMO-Synth project is both an Art and a Science project. As there is no clear distinction to be made between Art or Science, both fields can be considered as being incorporated in the project. Moreover, it is one of my hopes that the EMO-Synth project can serve in time as one of the many examples how to bridge the apparent gap between Art and Science in an intuitive and technological way.
Music and Image Generation in the EMO-Synth
As previously mentioned, music generation during EMO-Synth performances includes three sources which can be combined according to the choice of the user. The first source consists of audio material organized in a database that the EMO-Synth can consult during music generation. The audio material in this database can completely be customized. If used by an artist, this database can consist of his or her own audio clips. The second source of music or sound generation is implemented through the use of MIDI data streams. MIDI data is generated by the EMO-Synth and sent to several MIDI channels. These MIDI channels can contain percussive lines as well as harmonic and tonal material. As MIDI data itself is not a data type containing sound, using MIDI requires that the relevant MIDI channels are connected to appropriate software or hardware synthesizers. For the third and final source of sound generation, we can mention the use of live musicians. During performances if desired, the EMO-Synth can generate virtual scores in real time. Using these virtual scores, the live musicians are directed by the system. The audio material, MIDI data and virtual scores are generated by artificial intelligence models in the EMO-Synth that are trained in previous learning sessions.
As already mentioned, using the EMO-Synth involves two stages: a learning and a performance phase. During the learning phase, the EMO-Synth learns how to bring the test subject into four different states of arousal: state of low arousal, state of low average arousal, state of high average arousal and a state of high arousal.
The motivation to work with states of arousal can be found in general emotion psychology. 4[4. For an introduction to emotion psychology see Paul Ekman and Richard J. Davidson’s The Nature of Emotion: Fundamental Questions and George Mandler’s Mind and Body: Psychology of emotion and stress. Within this field we follow the theory of the emotion plane as developed by Robert Thayer (1989) and applied to music by Emery Schubert (2004).] According to this theory, human emotions can be categorized according to two dimensions: valence and arousal. Valence pertains to the positive or negative effect of an emotion and arousal to the intensity. 5[5. For information on these dimensions, refer to (Leman et al., 2005 and Leman et al., 2004).] As the measurement of the valence component cannot be realised using classical biofeedback devices, we chose to only work with the arousal component of the emotional state in the EMO-Synth project. During the learning phase, the artificial intelligence models used for sound generation are trained. For this learning phase, use is being made of machine-learning techniques such as genetic programming. Once the learning phase has passed, the EMO-Synth is ready to be used in the performance phase.
During the performance phase, the EMO-Synth will use its knowledge about emotional reactions of the user captured in the trained artificial intelligence models to compose appropriate sound and music. For video generation during performances, the EMO-Synth uses video clips contained in a customizable database. Each video clip is annotated to its arousal level or quality, for example, video clips with low arousal quality, video clips with low average arousal quality, video clips with high average arousal quality and video clips with high arousal quality. During the performance phase, the EMO-Synth composes a movie using the video clips from the database. At the same time, music and sound is generated by the system that has the same effect on the user as directed by the annotations of the video clips. In this way the result is an audio-visual experience where the emotional impact of image and sound on the user is maximized.
Bibliography
Arslan, B., et al. “From Biological Signals to Music.” Enactive 05. Proceedings of the 2nd International Conference on Enactive Interfaces (Genoa, Italy, 17–18 November 2005).
Arslan, B. et al. “A Real-time Music Synthesis Environment Driven with Biological Signals.” ICASSP 2006 Proceedings. Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (Toulouse, France, 14–19 May 2006), pp. 14–19.
Bateson, Gregory. Mind and Nature: A Necessary Unity. Advances in Systems Theory, Complexity and the Human Sciences. Hampton Press, 1979.
Beyls, Peter. Interactivity in Context / Interactiviteit in Context. KASK Cahiers 4. Ghent: Hogeschool Gent, 2006.
Corne, David W. and Peter J. Bentley, eds. Creative Evolutionary Systems. Morgan Kaufmann Publishers, 2002.
De Mey, Marc. The Cognitive Paradigm: An Integrated Understanding of Scientific Development. Chicago: University of Chicago Press, 1992.
Ekman, Paul and Richard J. Davidson. The Nature of Emotion: Fundamental Questions. Oxford University Press, 1994.
Filatriau, Jean-Julien and Loïc Kessous. “Visual and Sound Generation Driven by Heart Brain and Respiration Signals.” ICMC 2008. Proceedings of the International Computer Music Conference (Belfast: SARC — Sonic Arts Research Centre, Queen’s University Belfast, 24–29 August 2008).
Goertzel, Ben. and Pei Wang, eds. Advances in General Artificial General Intelligence: Concepts, Architectures and Algorithms. Proceedings of the AGI Workshop 2006. IOS Press, 2007.
Goldberg, David E. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, 1989.
Humphreys, Christmas. Zen Buddhism. New York: Macmillan, 1970.
Knapp, R. Benjamin and Hugh S. Lusted. “A Bioelectric Controller for Computer Music Applications.” Computer Music Journal 14/1 (Spring 1990) “New Performance Interfaces (1),” pp. 42–47.
Koza, John R. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992.
_____. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, 1994.
Lehmann, Andreas C., John A. Sloboda and Robert H. Woody. Psychology for Musicians: Understanding and Acquiring the Skills. New York: Oxford University Press, 2007.
Leman, Marc, et al. “Prediction of Musical Affect Attribution Using a Combination of Structural Cues Extracted from Musical Audio.” Journal of New Music Research 34/1 (2005) “Expressive Gesture in Performing Arts and New Media,” pp. 37–67.
Leman, Marc, et al. “Correlation of Gestural Musical Audio Cues and Perceived Expressive Qualities.” Gesture-Based Communication in Human-Computer Interaction. Lecture Notes in Artificial Intelligence 2915 (Berlin: Springer Verlag, 2004), pp. 40–54.
Mandler, George. Mind and Body: Psychology of emotion and stress. New York: Norton, 1984.
Miranda, Eduardo Reck. “On Interfacing the Brain Directly with Musical Systems.” Leonardo 38/4 (August 2005), pp. 331–336.
Nagashima, Yoichi. “Bio-Sensing Systems and Bio-Feedback Systems for Interactive Media Arts.” NIME 2003. Proceedings of the 3rd International Conference on New Instruments for Musical Expression (Montréal: McGill University — Faculty of Music, 22–23 May 2003), pp. 48–53.
Picard, Rosalind W. “Affective Computing.” MIT Technical Report 321, 1995.
_____. Affective Computing. MIT Press, 1997.
Pirsig, Robert M. Zen and the Art of Motorcycle Maintenance. William Morrow & Co., 1974.
Popper, Frank. From Technological to Virtual Art. Leonardo Book Series. MIT Press, 2007.
Rosenboom, David. Biofeedback and the Arts: Results of early experiments. Vancouver: Aesthetic Research Centre of Canada, 1974.
Schubert, Emery. “Modeling Perceived Emotion with Continuous Musical Features.” Music Perception 21/4 (Summer 2004), pp. 561–585.
Searle, John R. “Minds, Brains and Programs.” Behavioral and Brain Sciences 3/3 (December 1980), pp. 417–457.
Strogatz, Steven H. Nonlinear Dynamics and Chaos. Perseus Publishing, 2000.
Suzuki, Shunryu. Zen Mind, Beginners Mind. New York; Tokyo: Weatherhill, 1970.
Thayer, Robert E. The Biopsychology of Mood and Arousal. New York: Oxford University Press, 1989.
Social top