Hacker News new | past | comments | ask | show | jobs | submit | chaeronanaut's comments login

> The words that are coming out of the model are generated to optimize for RLHF and closeness to the training data, that's it!

This is false, reasoning models are rewarded/punished based on performance at verifiable tasks, not human feedback or next-token prediction.


How does that differ from a non-reasoning model rewarded/punished based on performance at verifiable tasks?

What does CoT add that enables the reward/punishment?


Without CoT then training them to give specific answers reduces performance. With CoT you can punish them if they don't give the exact answer you want without hurting them, since the reasoning tokens help it figure out how to answer questions and what the answer should be.

And you really want to train on specific answers since then it is easy to tell if the AI was right or wrong, so for now hidden CoT is the only working way to train them for accuracy.


BT2 is old news, we have BT4 now


An excellent explanation of Magic Bitboards can be found here: https://analog-hors.github.io/site/magic-bitboards/


this pretty much summarises my opinion - one nitpick - i assume you meant "omit bounds and other checks", not "emit bounds and other checks" which seems to mean the opposite of what you're intending


Rust does emit bounds and other checks, though. Optimization passes can usually clear some of them away, but you'd need to check the assembly output to be sure.


Yes, that's "omit".

"Emit" means "to send out", eg "emit a strange noise", "emit radiation".


It is specifically both:

- trying to access an arbitrary element in a slice, the compiler will emit bounds checks (`if index > len: panic()`) to avoid an uncontrolled out-of-bounds memory access — https://godbolt.org/z/cbY5ebzvK (note how if you comment out the assert, the code barely changes, because the compiler is adding an invisible assert of its own)

- if the compiler can infer that `index` will always be less than `len`, then it will omit the bounds check — https://godbolt.org/z/TTashYnjd


Yes, thank you! That's an embarrassing typo!

(And thanks to the other person as well, who presumably deleted the same comment after seeing yours.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: