Alex Does UTAU

Description

This was originally going to go on Youtube, but like so many of my attempts to make videos, it never got properly posted on there!

The Marlowe voice bank downloads mentioned can be found here.

And I think the IPA chart I used in the video is this one!

Oh, and here's the Electric Angel UST!

Script

Hello hello! I’m Alex, you can call me snail-legs, and welcome to another video.

Today I’m here to talk about UTAU, because I’ve been learning it!! I’ve been into vocaloid since I was like 13, and I wanted to try making stuff with vocalsynths myself. But vocaloid is expensive, especially as I’m a hobbyist at this stuff, so I decided to start with UTAU instead - and now here we are.

If you don’t know, UTAU is a free vocalsynth program that allows you to make your own voicebanks and get them to sing you songs. It’s also very old and hasn’t been updated in a long while, so I’ve actually been using OpenUtau instead.

After playing about in the program for a bit, and possibly because I was a little overwhelmed by the amount of options available to me when it came to downloading and using voicebanks made by other users, I decided I was going to make my own. This is UTAU after all, isn’t making your own voicebank meant to be half the fun?

So I started researching. There are a lot of ways to make a voicebank, as it turns out. Let’s run through what you need to record one:

A reclist; a list of sounds to record for the language you’re recording
OREMO; a software that was made specifically for recording UTAU voicebanks. it is possible to record on audacity or other recording software, but I found the workflow is best using OREMO.
A voice and a microphone to record. *technically you could make a bank out of any sounds, so I guess a voice isn’t required…
setParam; a software specifically for making an oto.ini file; used to configure UTAU voicebanks.

It seemed like the easiest bank type to start with was CV, or consonant-vowel. So I decided to start with that. CV is a japanese voicebank type; in comparison to English, the way Japanese is pronounced is much more consistent, which makes it a little easier to make functioning voicebanks for.

でも、僕は日本語が話せません。英語が話せます。

but I don’t speak Japanese. I speak English.

Even though I don’t speak Japanese, I learnt to read hiragana [¹] and made the simplest CV japanese voicebank I could. He sounds like this.

[clip of marlowe CV singing]

Not bad for a first attempt!! But definitely something I’d like to go back to and improve on later, particularly when it comes to pronounciation.

Anyways, after I was satisfied that I understood how CV banks worked, I decided to jump into the deep end and learn how to make an English bank. As with Japanese banks, there are a number of different approaches to creating an English voicebank.

My options were as follows:

VCCV - probably some of the highest-quality resulting English voicebanks and the most user-friendly to make because of the number of resources available. 2000+ lines of recording… Optimised for an American accent, which I don’t have.
CVVC delta english - resulting banks of varying quality, but a shorter list of maybe 500 lines to record. Better clarity than a lot of ARPAsing banks. Very little English documentation; list was made by & for Japanese users.
CV-C - a shorter, multilingual list. Samples of CV-C banks singing in English sounded clear, but this list seemed like it was definitely for advanced users.
ARPAsing - quick to record, lots of resources on how to make one, but apparently hell to configure.

I didn’t feel like recording VCCV because of the length of the reclist; maybe I’d be worth it if I had a better recording setup. I didn’t really want to deal with configuring an ARPASING bank, since it seemed even experienced users struggle with doing so. And I thought CV-C was probably best left till I had more experience in making voicebanks and using UTAU.

So I decided to make a CVVC delta English bank, relying on a translation of the original information page about the reclist and some IPA vowel charts to figure out what exactly the difference between “u” and “ʊ” was supposed to be.

Even though the list I chose only had around 500 lines to record, it took me several weeks just to finish recording. It didn’t help that I had a bit of a cough at the time.

But eventually the samples were recorded and it was time to configure the bank and see what he sounds like! After a bit of troubleshooting, I got mkototemp.exe to give me a base oto, and he sounded like this.

[clip of marlowe mkototemp oto]

Could you make out a single word of that? because I wouldn’t be able to without knowing what he was trying to sing.

Turns out I might’ve slightly botched the recording process, but I was determined to make this work.

But mktototemp had given me a base to work with, and I managed to find two whole guides on how to oto this type of voicebank properly, so I spent a few hours going through all the samples and otoing them by hand :)

[clip of marlowe post 1st pass on hand-oto oto]

He sounded much clearer after that, but his plosives still needed work…

But I needed a break from otoing, so I hopped into Audacity for a bit and used a macro to apply noise removal, EQ, and normalisation to all my samples. It took a few attempts before I was happy with how he sounded.

Then, because I was really procrastinating and I’d decided I didn’t like it anymore I redrew the artwork I’d done of Marlowe, the mascot character I made for my voicebanks.

I also drew this little halfbody as a warm-up I guess. I think these two images took me maybe 3-4 hours… after I’d gotten them going my art flow was just really good I guess.

I don’t have footage of any of this because I only decided I would make a video about it after doing all the work. Oops, I guess. I hope I’ve been able to make this video entertaining and or informative even with my lack of footage.

Anyways, I finally went through and polished the oto as best I could and he honestly sounds way better than I thought he would??

[final singing clip- mini MV?]

Like he’s not perfect obviously, but in my personal opinion that’s kind of the charm of a lot of vocalsynths. Though they sound artificial and have plenty of little quirks in the way they sing, they and the music they sing were made with a lot of love and human effort. And I just think that’s really special.

Thank you for watching!

If you’d like to download Marlowe, ~~there should be a link in the description! also, links to my webcomic, my website, and my social media should you want to check any of those out.~~

Marlowe lives here, and also! You’re watching this on my website so yay! You’re already here.

Hope you enjoyed! Have a nice day :)

Footnotes:

[¹] This has since spiralled out of control into me just. Learning Japanese. A lot of the music I like is in Japanese, and it’d be cool to be able to read manga in the original language… So since I’d already put the effort into learning hiragana I figured I might as well keep going ^^'