Fucked Up and Bad AI

LyingBard

LyingBard is attempting to reconstruct LyreBird, a couple-years-old, discontinued, AI TTS.

I'll try and post a new speech generation every so often to have a better idea of how well it's going.

The prompt is "THE NORTH WIND AND THE SUN WERE DISPUTING WHICH WAS THE STRONGER WHEN A TRAVELER CAME ALONG WRAPPED IN A WARM CLOAK."

The speaker is supposed to be me. I'm not in the training set, so this is a full test. If it can emulate my voice I will have succeeded in replicating LyreBird. However, I'm pretty sure I will have to add a feature from the paper I'm referencing to get actual intelligible output, but I'm not comfortable releasing any synthesized clips of any speakers without their permission so enjoy this sampling of speech-like noise.

The audio is hosted on Google Drive btw so loading this page too many times will prevent you from listening to the audio due to rate limits. You can still download the audio by right clicking it and saving it.

Epoch 206

Griffin-Lim (normal spectrogram inversion)

MelGAN (neural net from LyreBird)

Epoch 206

Griffin-Lim (normal spectrogram inversion)

MelGAN (neural net from LyreBird)

Things are taking a long time, but you can hear it just barely. The paper I'm referencing takes 250,000 steps so at 445 steps per epoch and 15 minutes per epoch we'll be waiting until epoch 560 which is 140 hours. Ooof.

With the training time almost up, I'm going to make a preemptive call that this isn't going to get too much better. So I'm calling this one

THE HORRORS

I'll be posting this one online as free to access and hopefully making an in browser app to use it. The speech is unintellible and unrecognizable enough that I don't think there's any risk of impersonation. I still have to try adding fine tuning and seeing if that works, but I'm gonna guess it won't. I've learned a lot making this so get ready for version 2 soon!

  • You'll need to download all the dependencies. Look at the top of all the .py files to find them all.
  • Put the models in AppData/Local/FUBAI/LyingBard or your OS's equivalent. The model with the highest number is the one that will be used.
  • The number at the end of a model is the iteration count. To get the iteration count of your favorite epoch featured below, multiply the epoch by 445.
  • Don't forget to suffer like I did :)

WARNING

The audio below is HOSTILE and at any time could release a SONIC ATTACK. Prepare your ears and watch your volume!

Epoch 557

Epoch 540

Epoch 512

Epoch 500

Epoch 460

Epoch 440

Epoch 410

holy crap it actually said the whole thing!

The audio below is float wav files with values higher than 1. They may play loud and distorted on some devices.

Epoch 399

Epoch 351

Epoch 301

Epoch 265

The ones below don't auto-stop so only the first 10 or so seconds are actually speech. The rest is horrific vocoder speech ambience.

Epoch 240

Epoch 215

This is starting to sound really freaky...

Epoch 203

omfg listen to this!

Epoch 191

omfg listen closely at 0:08 for "and the sun were disp-"

Epoch 162

Epoch 100

The ones bellow here are not normalized so they're really quiet so turn you volume up.

Epoch 40

Epoch 10

<|°_°|> <|°_°|> <|°_°|> <|°_°|> <|°_°|> <|°_°|> <|°_°|> <|°_°|> <|°_°|> <|°_°|>