Nsynth and us

Last week I was really intrigued by the somewhat impressive attempt of a machine-learning model to impersonate a human voice. We saw how good the model is with pure audio samples of human voice and how mysterious and weird it is with music.

That made me think of how will the Nsynth pre-trained model will handle samples that are both very human expressive and has also a musical touch to them.

I have a weird interest in indigenous music from around the world. The idea of bringing this earthly, raw and expressive human-music to the Nsynth model, excited me.

So, I chose two types of indigenous music

Native North-American Pow-Wow :

 

and Maori Haka chanting :

 

I worked with the Nsynth Colab notebook.
I wanted a relatively big samples so I did long trainings on a few excerpts I curated from the long audio files.

I did the first generation with the “instruments” model and it came out bad, so for the next samples I used the Voice model.

These are the results:

original:

reconstruction:

interpolation:

 

 

original:

 

reconstruction:

interpolation:

 

original:

 

reconstruction:

 

interpolation:

 

Leave a Reply

Your email address will not be published. Required fields are marked *