No need to be invited. Between their GitHub[1] page and the documentation[2], yo...

forgingahead · on Nov 3, 2022

How long did you train the models for each speaker, and what hardware were you using?

jamez · on Nov 4, 2022

It was fine tuning, so the process was a lot faster than I originally anticipated. I'd say it was between 36 and 72 hours for each voice. I have been working on a gradient notebook provided by Paperspace, which guaranteed me A6000 instances (48GB GPU RAM) at a reasonable flat rate. I discovered them after being repeatedly frustrated by the random allocation of GPUs on colabs pro+ plan.

hanselot · on Nov 3, 2022

How much input audio would you need to produce audiobook quality? Hint Hint...