Sinsy / Vocaloid / UTAU Workshop -- English

Sinsy, the Singing Voice Synthesis System is a Free Software that allows generating synthetic singing, similar to Yamaha Vocaloid and UTAU by Ameya/Ayame. The graphical frontend QTAU which will be used in this workshop is compatible with UTAU voices and also supports concatenative synthesis.

Sinsy / Vocaloid / UTAU Workshop  -- German

Sinsy, das Singing Voice Synthesis System ist eine Freie Software, mit der man künstlichen Gesang erzeugen kann, ähnlich wie VOCALOID von Yamaha und der Software UTAU. Die grafische Oberfläche QTAU, welche im Workshop vorgestellt wird, ist mit UTAU Stimmen kompatibel und unterstützt auch das (nicht nur) von VOCALOID verwendete Verfahren der konkatenativen Synthese,

Als Utauloid bezeichnet man von Fans aufgenomme Singstimmen, die meist einem Anime-Character als Avatar nutzen. Die meisten Utauloid können nur japanisch, einige unterstützen jedoch auch andere Sprachen wie z.B. englisch oder deutsch.  Im Workshop wird gezeigt wie man eigene UTAU Stimmen erstellt, mit dem OtoEdit Plugin bearbeitet und im Internet veröffentlicht. 

Anhand des VOCALOID-Songs Sakura no Ame wird gezeigt wie man ein Cover eines Songs erstellt, unter Verwendung einer japanischen UTAU-Voicebank in QTAU, mit dem Rosegarden Sequencer und Ardour für den Mixdown.

 

Hier noch einmal eine kurze Übersicht von Programmen, die ich am AM2022 vorgeführt habe:

QTAU: mein UTAU Klon, Frontend für Sinsy

Rosegarden: MIDI-Sequencer mit MusicXML Unterstützung und Lyric Editor

LMMS:  ehemals Linux-Multimedia-Studio, eine anfängerfreundliche DAW, als Ersatz für FL-STUDIO emfohlen.

The Institute for New Media Art Technology is known for developing MBROLA, Mage/pHTS and the ReVA-toolkit.

MBROLA is a speech synthesizer based on the concatenation of diphones, which was recently released as free software. There are several Swedish and Czech singing synthesizers based on MBROLA and the MaxMBROLA/MidiMBROLA instrument, which had been independently developed before the release of Hatsune Miku. The QTAU engine uses the MBROLA algorithm, with both UTAU and Festival voices.

They also extended the HTS Engine, which also powers Sinsy. This project is called Mage/pHTS and it works best with the English Festival voices, which have been converted to HTS format. The OLABuffer used by QTAU is also derived from Mage.

Finally there is ReVA-toolkit, the Reactive Virtual Agent toolkit for human-agent interaction applications. This seems to be similar to MikuMikuDance(MMD) and MMDAgent, but it is implemented using a patched version of the Godot engine.

To install QTAU and its dependencies on Manjaro run the following commands:

#first install global dependencies

sudo pacman -S git gcc make cmake pkg-config autoconf automake libtool

#create a source directory where you clone the repositories:
mkdir ~/src
cd ~/src

#install all dependencies

sudo pacman -S pkg-config libsndfile boost gsl libsmf qjackctl

git clone https://github.com/espeak-ng/espeak-ng
pushd espeak-ng
./autogen.sh
./configure
make
sudo make install
popd

sudo pacman -S libsndfile boost-libs boost gsl
git clone https://notabug.org/isengaara/sekai
pushd sekai
mkdir build
pushd build
cmake ..
make
sudo make install
popd
popd

git clone https://github.com/r9y9/hts_engine_API
pushd hts_engine_API/src
./waf configure
./waf
sudo ./waf install
popd
 
git clone https://notabug.org/isengaara/sinsy
pushd sinsy
mkdir build
pushd build
cmake ..
make
sudo make install
popd
popd

#install QTAU

git clone https://notabug.org/isengaara/qtau
pushd qtau
mkdir build
pushd build
qmake
make
make install
popd
popd

There are two projects by Free(B)Soft which QTAU builds on: The Singing Computer and Festival Czech.

Singing Computer

The aim of the Singing Computer project is to make another step towards the accessibility of music typesetting to the visually impaired users by introducing the possibility to check the input of lyrics. There is a nice and popular music typesetting tool called LilyPond that is based on a plain text input (similarly as e.g. in the TeX typesetting system). This way of work allows the visually impaired users to input music relatively comfortably in their text editors.

An important part of the music typesetting work is checking the result. This is easy for sighted users who can simply check the output on their screens or printed on the paper. But visually impaired users cannot do that. Even if the accessibility tools were able to describe the positions of notes and lyrics on the screen, checking it in such a way would be very tedious. That means that the visually impaired users have to use other tools.

While it is difficult for the visually impaired users to work with visually represented information, it is usually easy for them to work with information represented in the form of sound. And this is how the music typesetting tools can be customized. LilyPond already solves part of the problem — it offers MIDI output of the music. Listening to it the visually impaired user can check correctness of the music part. But what is still missing is a way to check that the lyrics was input correctly as well. This is what Singing Computer solves.

There original Singing Computer allows, with the help of the Festival speech synthesizer, to make the computer to sing the lyrics written in the LilyPond input file. This way a visually impaired user can check the lyrics and its proper alignment with the notes. The user can detect and correct mistakes made during writing the LilyPond input file. With the help of this tool the music typesetting accessibility becomes almost complete.

The new release (called robotník Utaite), is based on a fork of Sinsy, a Japanese singing voice synthesis system, which is used as a replacement for Festival. Using Sinsy makes it possible to use UTAU and eSpeak voices with a more natural sound and many languages not supported by Festival. You can also use existing Festival or MBROLA voices, future versions may be compatible with NNSVS.

 Festival Czech

Festival Czech provides Czech support to Festival speech synthesis system. Together with Festival and Czech diphone database it provides complete speech output in Czech.

QTAU is a piano roll editor and multilingual voice synthesizer compatible with UTAUFestival, eSpeak and MBROLA.

QTAU News:

At Froscon I(isengaaraP) will did a Lilypond workshop which also includes singing voice synthesis.

As part of the the OpenRheinRuhr, I will do a short introduction into QTAU.

The next workshop is planned for the Anime Marathon convention in 2022

QTAU IRC Chat:

There are two IRC chats for QTAU, isengaara will be in both chats every Sunday from 18:00 to 21:00 Berlin Time and during the Workshops. You can feel free to join the chats any time, even when isengaara is offline.

  • #qtau [English Chat]
  • #qtauger [German Chat]

 Retroshare: