Music has been helping us stay together even as we stay apart, from window serenades to video conference concerts. It is hard to imagine we will ever have a shortage of music making, not to mention the decades of existing music recordings. So we don’t need to train computers to make new music for us, but of course that hasn’t stopped us. We don’t do it so that they can replace us, we do it so we can better understand the music we make and what makes it appealing. And whether it is the intention or not, I think teaching our machines to make a joyful noise is just as worthy as making those sounds ourselves.
Computers can make music in a variety of ways. They can attempt original compositions, either by generating scores to be performed/synthesized or by combining loops and samples and bits of recordings. The newly launched OpenAI Jukebox is attempting something different, more akin to generating new recordings instead of composing new songs (worked best in Chrome for me, FYI). Rather than trying to specify notes or words, the software constructs soundwaves more directly, specifying moment to moment what frequencies should be heard. As a rough analogy, imagine reconstructing a digital image by specifying the color pixel by pixel, as opposed to assembling an image from objects like cats and baby Yodas.
The end result is a sort of AI version of Wheel of Musical Impressions, with reasonable simulacra of popular artists singing familiar songs as well as new–at least new to them. Consider the following sample:
That’s a duet of Frank Sinatra and Ella Fitzgerald, singing a song from from the recent Oscar-nonwinning film La La Land, written years after their deaths. Note that this is not exactly as straightforward as feeding the song lyrics into a speech synthesizer with samples of their voices. As a result, the recordings meander around the lyrics and occasionally veer out of the realm of existing English altogether. Thus we get examples like this rather surreal Rick Roll:
The software can use varying inputs to start from; the above starts with a few seconds of the original recording and then finishes the performance in its own idiosyncratic fashion. This is reminiscent of another OpenAI project, a text generator that made a splash a year ago. You can give it a starting sentence or two and it could generate a small document from that prompt, well beyond the predictive text in your phone that can guess a word or two. The result is sentences and paragraphs which are often eminently readable, although they still have a tendency to meander and at times even contradict themselves. There was a sense of a student trying to pad out a term paper; there may be no obvious problems or mistakes but you are left wondering if anything was actually said.
Some of the music here has a similar quality, especially when there are no lyrics to provide structure. The moment-to-moment listening experience is pleasant enough, but there’s not much of a sense of melodic development. Try this soundtrack in the style of Danny Elfman for example (clip not embeddable). If you know film music, you’ll likely recognize the Elfmanesque qualities it has identified and reproduced, but you probably won’t want to hire Jukebox for your next blockbluster.
One of the secondary outcomes is a clustering of existing songs and artists. Broad genre groupings are apparent, although probably more interesting are artists at the edges of genres and also subgenre clusters. For example, hip hop seems to have two distinct groups, and the classical composers tend to be closer to fellow members of the same era and/or movement. Even if you don’t want to list to the synthetic music, you might find it interesting to browse through these clusters.
One last sample for you, a take on contemporary Christian music. Maybe something to keep in mind if the church worship leaders need a break from streaming.
There are over 7,000 recordings to explore; take a listen and share any interesting finds in the comments!