Deep neural audio synthesis. The goal here is not to create sounds that are indistinguishable from the training data (i.e. capable of passing an ‘audio Turing test’), but to create sounds that are somehow reminiscent of the training data, but are also novel and un-human, yet still interesting, and – most importantly – allow a human user to expressively, and meaningfully manipulate the output in realtime (e.g. see Deep Meditations, Ultrachunk, Nimiia Cétiï). More information and paper coming soon.
Grannma MagNet – Granular Neural Music & Audio with Magnitude Networks (2018)
Chopin Drones
Slow slithering slices in space. Trained on Chopin Nocturnes.
Long sequence morphs aka #DeepMix.
These are not crossfades, but morphing trajectories in latent space z. This creates interesting ‘crossover’ sounds which contain characteristics from both source and target. E.g. especially worth noting:
0:04 – 0:08 : vocal morph
0:43 – 0:46 : vocals to plucked string morph, with 0:46 both vocals and strings playing same note (not in the original material)
1:05 – 1:15 : transition from vocals to dubstep
1:15 – 1:25 : dubstep to birdsong
1:25 – 1:35 : birdsong to darbuka
2:05 – 2:15 : the sitar overtones acquire a piano-like quality (or the piano starts playing sitar overtone like textures)
2:32 – 2:35 : vocals sing piano’s melody, in real life those vocals with that melody doesn’t exist
2:50 – 2:55 : trumpet to speech morph
3:05 – 3:25 : voice takes effects of guitar, distortion, reverb etc
3:35 – 3:50 : distorted guitar to vocals
Short sequence morphs
Same as above but on short loops, hopefully showing how the spectra morphs and grows from sound to sound as opposed to crossfading. This model is trained on Chopin Nocturnes, but is fed speech through it during inference, so the output has characteristics of both piano and voice.
Time-stretching aka z-trajectory stretching (aka #DeepSmear)
A bit like ‘Paulstretch’. However, instead of repeating blocks (or ‘grains’) of sound as Paulstretch does, this morphs them (as above) – as you can hopefully hear as the syllables themselves seem to morph into each other. Videos below at 600% and 1500%, with increasing amounts of grain overlap (to ‘smooth’ the sound out and minimise the tremolo effect). This is the same model from ‘Long Sequence Morphs’ trained on music, and then my voice is fed through it for inference, so the output has characteristics of both my voice, and various instruments.
Bug or feature?
Off-by-one error turns what was supposed to be speech, into a 90s IDM-style loop.
Teardrop forests
Miscellaneous cross-model hacks
Running very unfamiliar sounds through models trained on very different data (e.g. Brenda Lee or Vivaldi through a model trained on Chopin Nocturnes etc.)
Related work
Acknowledgements
Created during my PhD at Goldsmiths, University of London, funded by the EPSRC UK.