UNA: Uniform Neural Alignment.
Haven't u noticed yet? Each model that I uniform, behaves like a pre-trained.. and you likely can fine-tune it again without damaging it.
If you chatted with them, you know .. that strange sensation, you know what is it.. Intelligence.
Xaberius-34B is the highest performer of the board, and is NOT contaminated.
In addition to what was said, if its anything like DPO you don't need a lot of data, just a good set. For instance, DPO requires "good" and "bad" responses for each given prompt.
Actually I am testing the 34B myself (not the 7B), and it seems good.