Even Python gets a bit touchy if you ask it to avoid common packages.
An example is asking for simple Kalman filter, limiting to 2x2 matrix to avoid the need for LU decomposition. If you ad to the prompt a constraint to not use Numpy, which almost everything in the corpus does.
Even with LRM's having a high enough top-k accuracy, so that at least one correct solution given in k guesses seems to be the trick.
Perhaps Pyhon+Numpy is a language barrier but the errors without Numpy seem really trivial, similar to what one would see on an obscure language. It is different across different models, but getting stuck generating verification code with divide by zero to giving up and producing code that uses numpy are failure modes I have seen.
Professor Subbarao Kambhampati's explanation really helps here IMHO.
"Compiling the signal verifier" in, at least superficially to me, is a good intuition on where these fail.
The limits of Top-K and heavy tail dependance in many tasks will be something painful, I think we will need more expertise and not less among programers just due to the failings of us humans and our over trust of automation etc...
How we change the career path to develop tacit and technical abilities is a big question personally.
An example is asking for simple Kalman filter, limiting to 2x2 matrix to avoid the need for LU decomposition. If you ad to the prompt a constraint to not use Numpy, which almost everything in the corpus does.
Even with LRM's having a high enough top-k accuracy, so that at least one correct solution given in k guesses seems to be the trick.
Perhaps Pyhon+Numpy is a language barrier but the errors without Numpy seem really trivial, similar to what one would see on an obscure language. It is different across different models, but getting stuck generating verification code with divide by zero to giving up and producing code that uses numpy are failure modes I have seen.
Professor Subbarao Kambhampati's explanation really helps here IMHO.
https://bsky.app/profile/rao2z.bsky.social/post/3lkjnrrv2qk2...
"Compiling the signal verifier" in, at least superficially to me, is a good intuition on where these fail.
The limits of Top-K and heavy tail dependance in many tasks will be something painful, I think we will need more expertise and not less among programers just due to the failings of us humans and our over trust of automation etc...
How we change the career path to develop tacit and technical abilities is a big question personally.