Techno-Vocality

[Saturday @ 3:00pm – 5:00pm, Room 6]

Cathy Lucas

“Performing speech: Voice as Instrument in Wolfgang Von Kempelen’s Mechanical Speech Project”


Abstract

In this paper, I propose a ‘new organological’ reading of Wolfgang Von Kempelen’s speaking machine to shed light on Early Modern analogies between the voice and musical instruments, and to consider the character of the expertise required for the performance of mechanised speech on the device. Historiographically, the speaking machine has often been placed alongside other automata as a material repository for new kinds of mechanically codified knowledge about the body. More focused histories of speech imitators have followed 19th century mechanist Robert Willis in highlighting how, in contrast to later acoustical devices, Von Kempelen’s machine was closely modelled on human vocal physiology. In the first part of this paper, I use Von Kempelen’s published manual on the mechanism of speech (1791) and both historical and modern-day replicas of his instrument, to highlight the role of musical instruments in the design, and to draw attention to the importance of practiced, gestural knowledge for ‘playing’ the machine. Much more than the mechanised sum of its parts, speech here was a collaboration between human performer and moulded matter. Read in this way, the device registers an epistemology of speech as a skill that could be learned, developed, and perfected, an intervention I consider, in the final part of the paper, in relation to contemporary debates around the origin, development, and status of language and speech.

Biography

Cathy Lucas is a musician, producer and PhD researcher living in London. She is based in the Department of Science and Technology Studies at UCL and the Science Museum, and is currently a visiting fellow at the Smithsonian Institution working with their acoustics and electricity collections. Her research interests include sound and communication technologies, voice science, acoustics, and music in the nineteenth century, and she has a particular interest in exploring the connections between musical and scientific instruments. Cathy has also spent over a decade collaborating on London’s experimental music scene, chiefly as the band leader and producer of art-pop group Vanishing Twin. More recently she has combined her research and musical interests in projects to integrate sound into storytelling about the history of sound technologies.

Will Mason

“Holly, Plus Whom?: The Holly+ Deepfake and Musical Labor under Artificial Intelligence”


Abstract

In July 2021, the electronic musician and scholar Holly Herndon launched Holly+, a vocal deepfake modeled on her singing voice. Holly+ is powered by an artificial intelligence engine trained on recordings of Herndon’s voice; it was created in collaboration with Herndon’s partner Mat Dryhurst, and Yotam Mann and Chris Deaner of Never Before Heard Sounds. Holly+ is an example of “timbre transfer,” where a deep learning engine is used to map the timbre of a source sound onto the timbre of a target sound. Holly+ is also an art piece unto itself: Herndon has placed questions of vocality and identity at the fore of the project. “The voice is inherently communal,” she wrote in announcing Holly+, “learned through mimesis and language, and interpreted through individuals. In stepping in front of a complicated issue, we think we have found a way to allow people to perform through my voice.” A Decentralized Autonomous Organization (DAO) established by Herndon owns the intellectual property for works created using Holly+ and can govern their usage and collect royalties.


My presentation will introduce Holly+ as outlined by Herndon and Dryhurst. I will consider what is at stake if Holly+ is understood as an instrument, and perhaps merely as an instrument. Such a view illuminates some labor implications of the technology that are simultaneously much less novel and much more urgent. One of the most provocative questions Herndon asks is whether her vocal deepfake means someone else could perform a Holly Herndon concert in her stead. While the DAO helps address intellectual property concerns—a vital consideration in the age of the Blurred Lines lawsuit, to be clear—the questions about labor that the Holly+ team have raised will require urgent thought as AI-powered timbre transfer technology continues to develop.

Biography

Will Mason is Associate Professor of Music at Wheaton College in Norton Massachusetts, where he teaches courses in music technology and music theory. He is co-editor of the forthcoming Oxford Handbook of Spectral Music, and is at work on a monograph about metaphors of construction in audio recording. He is also active as a composer of electroacoustic music, as an audio engineer, and as a jazz drummer.

Zeynep Bulut

“Biosensing Musical Interface As Tactile Speech”


Abstract

This talk will discuss biosensing musical interfaces (such as the BioMuse, created by Benjamin Knapp and Hugh Lusted in 1988, the Lady’s Glove, devised by Laetitia Sonami in collaboration with Paul DeMarinis in 1991, and the Xth Sense, conceived by Marco Donnarumma in 2010) as a case of tactile speech. Referring to histories of vibrotactile communication and speech recognition and synthesis (Mills 2011-2012, Parisi 2018, Mills & Li 2019) and digital health studies (Schüll 2016), I suggest that biosensing musical interfaces are both touch-driven and voice and speech driven technologies.  

Biosensing musical interfaces are interactive and kinetic systems. They comprise sensors that detect performer’s physical gestures and bioelectrical or biophysical signals, hardware, and software that amplifies, filters, and digitizes the signals and translates them into audio. With this set-up, they interactively shape bodily gestures, signals and sounds. The interfaces have been explored in relation to human-computer interaction, digital musical instruments, affective computing, machine and human learning, assistive technology, and music and wellbeing. 

In media and communication studies, tactile speech is considered with respect to hearing and speech technologies developed for blind, deaf and hard of hearing people. Some of these technologies include hearing gloves and signing gloves. Wearable technologies designed for monitoring heart rate, movement, and sleeping patterns also operate on the basis of tactile transmission of bodily signals. The technical procedures and the prosthetic aspects of tactile speech technologies, wearable health technologies, and biosensing musical interfaces demonstrate similarities. However, their contexts, applications, and cultural implications are different. Tactile speech technologies such as signing gloves aim at functional translation and linguistic communication by using auditory and verbal tools. Wearable health technologies quantify bodily signals and help assign particular meanings to numeric classifications. Biosensing musical interfaces do not necessarily provide functional translation or linguistic exchange. Exploring expression and stimulation through intentional variation and control of physical gesture, they suggest a different case of tactile speech.  

Addressing the convergences and divergences across these technologies, this talk will examine how skin and voice can be treated as mediums for stimulation, expression and exploration as well as for articulation, computing, and translation, and how biosensor performances evoke voice as skin, as a multi-sensory interface, which prompts revisiting what we mean by mediation and communication.  

Biography

Zeynep Bulut is a Lecturer in Music at Queen’s University Belfast. Prior to joining the faculty at QUB, she was an Early Career Lecturer in Music at King’s College London (2013-2017), and post-doctoral research fellow at the ICI Berlin Institute for Cultural Inquiry (2011-2013). She received her PhD in Critical Studies/Experimental Practices in Music from the University of California, San Diego (2011). Her research interests include voice and sound studies, experimental music, sound and media art, technologies of hearing and speech, voice and environment, and music and medicine. Her first manuscript, titled, Building a Voice: Sound, Surface, Skin (under contract with Goldsmiths Press), explores the emergence, embodiment, and mediation of voice as skin. Her articles have appeared in various volumes and journals including Perspectives of New Music, Postmodern Culture, and Music and Politics. Alongside her scholarly work, she has also exhibited sound works, composed and performed vocal pieces for concert, video, and theatre, and released two singles. Her composer profile has been featured by British Music Collection. She is a certified practitioner of Deep Listening, and project lead for the research network, Music, Arts, Health, and Environment (MAHE) supported by the Economic and Social Research Council’s Impact Acceleration Account at QUB. 

Stefan Greenfield-Casas

“Virtual Ventriloquism: The Live-2D Hyperreal”


Abstract

Since 2020, VTubers, streamers who use a motion capture animated model while they stream, have taken the internet by storm. On October 20, 2020, Hololive Production’s Gawr Gura became the first VTuber to reach one million subscribers on YouTube; on February 16, 2022, the VTuber Ironmouse became the most subscribed-to streamer on all of Twitch, with over 95000 paying subscribers. While these VTubers are, as with most streamers, well-known for playing games on stream, Hololive—the most popular VTuber agency—members in particular are known for another reason as well: their status as “idols,” and the songs they produce and sing.

In this paper I argue that part of the reason for Hololive’s success is based around its talents’ voices and that their branding as idols highlighted the voice as a fetish object that tethers the “live 2D” virtual model to the material “real” world. Furthering Hayley Fenn’s recent claim that puppets afford a kind of “inherent musicality” (2022), I draw from theories of voice (Barthes 1977; Garza 2022), idol culture (Richardson 2016; Bridges 2022; Lo 2023), and Japanese media (Otsuka 1989; Steinberg 2012), to show that the voice becomes an important element in bringing these virtual puppets to life. This extends beyond the VTuber avatar, however, to construct the “media mix” that fans buy into and in which they participate. This includes fans purchasing merchandise (voice packs, limited edition albums), attending virtual karaoke sessions, and even using the talents’ voices or music in their own creations (e.g., fan-produced games). I conclude by showing how this vococentric model has been both upheld but also challenged and critiqued by other VTubers, particularly independent VTubers who are not managed by an agency.

Biography

Stefan Greenfield-Casas is Visiting Assistant Professor of Music Theory at the University of Richmond. His research focuses on the intersection(s) of music, myth, memory, and media, recently by way of the classical arrangement and concertization of video game and film scores. He has presented his research at various conferences across the US, Europe, and Asia, including meetings of the International Musicological Society’s Music and Media Study Group, the Royal Musical Association’s Music and Philosophy Study Group, the American Musicological Society, Music and the Moving Image, Ludomusicology, and the North American Conference on Video Game Music. Stefan’s publications can be found or are forthcoming in The Music of Nobuo Uematsu in the Final Fantasy Series, The Oxford Handbook of Arrangement Studies, The Oxford Handbook of Video Game Music and Sound, and Translight: A Contemporary Gaming Magazine.