You could get this working very consistently with GPT-4 in mid 2023. The version before June, iirc. No JSON output, no tool calling fine tuning... just half a page of instructions and some string matching code. (Built a little AI code editing tool along these lines.)
With the tool calling RL and structured outputs, I think the main benefit is peace of mind. You know you're going down the happy path, so there's one less thing to worry about.
Using structured outputs pretty extensively for a while now, my impression has been that the newer models take less of a quality hit while conforming to a specific schema. Just giving instructions and output examples totally worked, however it came at a considerable cost of quality in the output. My impression is that this effect has diminished over time with models that have been more explicitly trained to produce them.
You could get this working very consistently with GPT-4 in mid 2023. The version before June, iirc. No JSON output, no tool calling fine tuning... just half a page of instructions and some string matching code. (Built a little AI code editing tool along these lines.)
With the tool calling RL and structured outputs, I think the main benefit is peace of mind. You know you're going down the happy path, so there's one less thing to worry about.
Reliability is the final frontier!