Manual for Z Vocal Project UTAU voicebanks

Intro

Hello! This is Z Vocal Project’s Tutorial for using our voicebanks in OpenUTAU. This is not meant to be an all-inclusive guide, and simply covers some things users who are completely new to OpenUTAU may need to know. If we get anything wrong here free feel to correct us. This guide covers installation and voice colors.

Definitions

We’ll be using some terms later on that may be confusing to someone who’s never touched vocal synths before, so let’s break them down here.

Standard Voicebanks

In very simple terms, “Standard” voicebanks are collections of audio samples and an Oto.ini file that get strung together to form singing, and are often usually only capable of singing in a single language. In OpenUTAU these voicebanks are called “Classic Voicebanks”. An example of such a voicebank would be Dunder_CV Complete. An easy hint to whether or not the voicebank you’re using is a Standard voicebank is if it has a term like “CV, VCV, CVVC, VCCV, ARPA or STD” in it’s name. The first 5 are the shorthand for popular standard voicebank types, while STD is a commonly used abbreviation for Standard.

A.I. Voicebanks

A.I. voicebanks are voicebanks that utilize machine learning to reproduce the voice of their voice provider. There are two types of A.I. voicebanks in OpenUTAU, ENUNU/NNSVS, and DiffSinger. For a variety of reasons, ENUNU fell out of fashion, and most A.I. voicebanks released nowadays can be assumed to be DiffSinger.

ENUNU Voicebanks

ENUNU is a plugin that allows for NNSVS voicebanks to be used in UTAU, and we will cover their use later in this guide. ENUNU supports labeled data at the note level and can support multi-lingual (though I don’t think OpenUTAU supports it). ENUNU voicebanks require less data to sound nice compared to DiffSinger, but are also considered lower quality compared to DiffSinger.

DiffSinger Voicebanks

DiffSinger is a Shallow Diffusion based engine that allows for high quality singing synthesis. It supports labeled data and the speaker level and supports multi-lingual voicebanks. They require a lot of data compared to ENUNU and are a lot more brutal if you’re a bad singer.

Installation

As mentioned in the intro, our voicebanks are intended for use in OpenUTAU. You can download it here, and they even have their own tutorials that may cover things we do not. The program will update to the latest version upon launching.

To install a voicebank navigate to the tools tab and select “Install Singer…”

It will open up a file selector, where you will select your voicebank. You do not need to unzip your voicebank, simply select the ZIP file in this menu.

Next it will bring you to the Singer Setup menu. Most Z Vocal Project are encoded in Unicode and should be fine, however just in general if any file names look wrong you should change the file encoding in this menu until they look correct.

Upon hitting next your singer will be installed, it may ask you if your voicebank is Classic, Enunu, or DiffSinger, click the one that corresponds to your voicebank type and proceed. If all goes well your voicebank will be installed.

Usage

General Usage

Voicebank Setup

Your voicebank may not have everything configured correctly upon initial use, or may become incorrectly configured through use. Their are also situations that may require additional steps be done. Let’s get into all of those.

ENUNU Voicebank Setup

Before using an ENUNU voicebank something very important must be done, you must run the official OpenUTAU ENUNU private server.

Go and download ENUNU for OpenUTAU as well as the latest version of ENUNU, as ENUNU for OpenUTAU is out of date. Extract ENUNU for OpenUTAU and then extract the latest version of ENUNU to the same folder, replacing EVERYTHING. This will likely take a long time. From now on, when you want to use ENUNU voicebanks you will need to run enunu_server.bat. Loading an ENUNU voicebank without running that can cause the entirety of OpenUTAU to refuse to render until you restart it.

Phonemizer Selection

There a variety of Phonemizers and the one you use depends entirely on the voicebank you are using.

I am now going to explain every single relevant Phonemizer even though a lot of them are self explanatory. You will generally only need to set this once unless doing things with crosslingual.

For Japanese Voicebanks:

Japanese CV voicebanks use the Default Phonemizer.
Japanese VCV Voicebanks use the Japanese VCV Phonemizer.
Japanese CVVC voicebanks use the Japanese CVVC Phonemizer.

Japanese Diffsingers use the Diffsinger Phonemizer or the Diffsinger Japanese Phonemizer (I do not think there’s a difference but I could be wrong).

Japanese Enunu Voicebanks use the Enunu phonemizer, or the Enunu Onnx Phonemizer if using an Onnx Enunu voicebank (Enunu died before these became widespread).

For English Voicebanks:

English VCCV voicebank use the English VCCV Phonemizer.
English ARPA voicebanks use the English Arpasing Phonemizer.
English X-Sampa voicebanks use the English X-SAMPA phonemizer.
English Diffsinger voicebanks use the Diffsinger English Phonemizer.

English Enunu Voicebanks use the Enunu English phonemizer, or the Enunu English Onnx Phonemizer if using an Onnx Enunu voicebank (Enunu died before these became widespread).

Crosslingual with Phonemizers

Standard Japanese voicebanks can use the English to Japanese Phonemizer to achieve a very basic form of crosslingual. How smooth it sounds really depends of voicebank type.

Diffsinger voicebanks that support multiple languages should be compatible with the Diffsinger Phonemizer of any language they support. To achieve crosslingual you will have to change the track’s Phonemizer to the target language. Note level crosslingual is possible only if you manually enter the phonemes.

Renderer Setup

By default, OpenUTAU uses WORDLINE-R as it’s renderer. If you want to use either the traditional UTAU renderer, or one you’ve found online to use, first switch from WORLDLINE-R to CLASSIC.

When swapped to CLASSIC you can now hit the gear icon on the left top open up the Track Settings menu where you can select your Resampler and Wavtool. Clicking “Location” will open your Resampler/Wavtool folder, where you can add your own Resamplers/Wavtools.

Track Setup

To start off with a blank project, go to your track and click on “Select Singer”, the first option after the track name. It will show you a list of voicebanks you have installed/used recently. Voicebanks not in your used recently will appear in one of 4 menus, Classic for standard voicebanks, Enunu for ENUNU/NNSVS voicebanks, DiffSinger for DiffSinger voicebanks, and Favourites for voicebanks you have favorited. Clicking the heart icon next to a voicebank favorites it.

Click the empty space on that track to create a new part. Click on the part to enter the Piano roll where you can begin drawing notes.

Entering Lyrics

Entering lyrics differs depending on the voicebank language. As Z Vocal Project currently only offers English and Japanese voicebanks these are the two we will be covering.

One thing they have in common I will mention is that you can extend a lyric over multiple notes by using the + symbol in place of lyrics.

Japanese Lyrics

Z Vocal Project Japanese voicebanks are encoded in Kana. This means that for the most part you are entering your lyrics in Hiragana. If you notice your voicebank isn’t playing, make sure you have your lyrics inputted correctly. A tell tail sign the lyrics are wrong is if the Phonemizer is empty, this means there is no alias in the voicebank’s config file corresponding to the note you’ve entered.

An easy way to fix this is by going to Batch Edits -> Lyrics and selecting Romaji to Hiragana (or Hiragana to Romaji if you have the opposite issue).

Do not worry about having to switch your keyboard to Japanese. If using a Japanese phonemizer like you should be, when you type out a syllable in Romaji, it will automatically suggest that syllable in Kana.

English Lyrics

Inputting English lyrics in OpenUTAU is pretty simple bar one very big thing that will likely not be clear to you from the start.

When entering a multisyllable word it will not be split over multiple notes like in programs such as CeVIO and Vocaloid, instead you must manually tell the program which notes to split the word over. Additionally, multi-syllable words must be connected, to get around this you can enter the individual phonemes by putting them in brackets [].

With EN ARPA voicebanks you can extend a syllable across notes using – as well as +. This does not extend to other English Phonemizers (at least not Diffsinger).

Tuning

For the purpose of this manual, we’ll just call Tuning any modification to the default output of the program. This is not an in-depth guide at OpenUTAU’s parameters, just things that may require explanation

Voice Colors

Voice colors are sub voicebanks you can select to alter how a note/phoneme sounds.

For Standard voicebanks these are separate UTAU voicebanks and or prefix maps you swap between. (A prefix map is what tells a voicebank when to switch from a voicebank of one pitch to another.)

Standard voicebank colors can provide a variety of uses, for example in voicebanks like Dunder_Fuyu Act II whose default voicebank transitions between two pitches, voice colors may provide manual overrides to force a note to a certain pitch. In the case of voicebanks like Dunder_CV Act II Complete however, voice colors change the tone of voice entirely. CV_Act II Complete also contains some voice colors that change the tone of voice base on the pitch.

With A.I. voicebanks, changing the voice color changes the note to a new speaker, which is a sub bank sampling a specific tone of voice. These transition are a lot more natural, and while you can technically make one based off a specific range of the singer, that may be better left to the Tone Shift parameter.

Swapping Voice Colors

As shown in the images above, you can swap voice colors by clicking on the CLR tap and selecting the Voice Color for each note/phoneme. This works well for small adjustments, but can be inconvenient for changing multiple notes at a time. As such there are two other ways of changing Voice Color you may find more convenient.

The first way is to select the notes you want to modify, click on the menu icon in the top right, navigate over to expressions-> voice color and select the voice color to set. In this tab you can also adjust variables such as vibrato, gender, velocity, tone shift, etc. en masse.

The other way is by right clicking the voicebank’s icon and clicking on “Voice Color Remapping”. This will let you transfer the control points you’ve set for one color to a different color. This menu will automatically show up when switching from one voicebank with voice color support to another. The use of this menu outside of that is very situational compared to just selecting all notes, but is very useful if you decide you want to swap a color to another.

Pitch Editing

There are two ways to mess with the pitch of a note. Either select PITD in the parameters panel at the bottom (relatively advanced) or select the Draw Pitch tool at the top to draw it directly on

You may also decide to alter pitch with note bending, splitting up a single note and dragging sections of it to where you want them, letting the program do the pitch transitions for you. When tuning an existing project file with lyrics filled out, you can use the knife tool to have it automatically split notes where you click on them. Please note that to my knowledge, as extend note and split note across syllables are both handled by plus in OpenUTAU, multi-syllable English words can only be split on the last syllable, as splitting any other syllable will treat the last syllable as the note extension and the tail half of the newly split note will become the syllable after it.

You can also adjust the portamento of notes with pitch control points. I will say I think I’m missing something big here because this system has a lot more to it in normal UTAU that OpenUTAU has seemingly scrapped. (Again it’s probably still here I just cannot find it)

If you convert PITD to control points you can get the system I’m talking about back, I just cannot figure out where you add control points normally. Sorry.

Gender

Gender alters the formant of the voice (I think). You can use this to give the voicebank a higher or deeper voice. To alter the gender of a note, you can either select GEN in the parameters field or select the notes you want to alter and move the gender slider in the expressions tab.

You also have the option to alter Gender as a curve for more advanced tuning.

Vibrato

Vibrato can be edited visually by selecting the icon underneath a note. I have labeled the control points based on what they do but you’re better off just experimenting on your own.

Other Stuff

Instead of adjusting visually you can also use the vibrato tool in note properties. This panel has a few extra parameters and also allows you to save vibrato presets.

There are a variety of other parameters you should explore in your own time but I am not that well knowledged on what they do so I will not try to misinform you in this guide.

End of Manual

I will probably update this more later on, I’ve basically just written this to keep myself awake while I wait for something and it is in no way a definitive OpenUTAU guide. If you have anything you think I should add feel free to send an email.