Variational Parametric Models for Audio Synthesis
Description
With the advent of data-driven statistical modeling and abundant computing power, researchers are turning increasingly to deep learning for audio synthesis. These methods try to model audio signals directly in the time or frequency domain. In the interest of more flexible control over the generated sound, it could be more useful to work with a parametric representation of the signal which corresponds more directly to the musical attributes such as pitch, dynamics and timbre. These parametric representations also facilitate better musical control of the synthesized output. We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder trained on a suitable parametric representation. We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch. We also investigate a parametric model for violin tones, in particular, the generative modeling of the residual bow noise to make for more natural tone quality. To aid in our analysis, we introduce a dataset of Carnatic Violin Recordings where bow noise is an integral part of the playing style of higher-pitched notes in specific gestural contexts. We obtain insights about each of the harmonic and residual components of the signal, as well as their interdependence, via observations on the latent space derived in the course of variational encoding of the spectral envelopes of the sustained sounds.
Files
thesis.pdf
Files
(5.5 MB)
Name | Size | Download all |
---|---|---|
md5:ec740c0673526c1704d3416a6a994f6f
|
5.5 MB | Preview Download |
Additional details
Related works
- Is part of
- Conference paper: https://arxiv.org/abs/2004.00001 (URL)
- Preprint: https://arxiv.org/abs/2008.08405 (URL)
- Poster: https://arxiv.org/abs/1911.08335 (URL)