I have completely different experience.
Which models are you talking about? I have no trouble at all with AI documenting the steps it took. I use codex gpt5.4 and Claude code opus 4.6 daily. When needed - they have no issue with describing what steps they took, what were the problems during the run. Documenting that all as a SKILL, then reuse and fix instructions on further feedback.