Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> throwing away information that our hearing systems can't process very well is not particularly useful if your goal is speech (or more generally, audio) synthesis from scratch.

I'm not sure I follow. If there is a set of tokens that the average human cannot perceive, why wouldn't we want to eliminate them from the search space? Who is the target audience for this model?



Humans that read (at least) Indo-European languages can read texts in their native language with all the vowels removed. Does that suggest that it would be a good idea to remove the vowels from text before using it for training text-based LLMs ?

Presumably you want to train on as rich a set of data as possible, even if some of that data is redundant or irrelevant when it comes to human perception.


Generally, the difference between regional dialects is almost all in vowels (sample: 0). This is why SOUNDEX [1] eliminated vowels.

0 - https://www.acelinguist.com/2020/01/the-pin-pen-merger.html

1 - https://en.wikipedia.org/wiki/Soundex


Maybe that things outside our audible range could impact/influence things inside of our audible range?


I imagine it would be like if there were Rosetta Stones of text, written with a language you could read and a language you couldn't. For your purposes, discarding the text you can't read would be fine and you wouldn't lose anything. But if you were ingesting a bunch into an LLM, the additional text would give the LLM more context and help it make connections and relate words more accurately, even if you never were going to have it output anything in the language you don't understand.

The inaudible sounds add context and additional datapoints on how the audible sounds are related.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: