How long does it take the operator to sift through those 10,000 attempts to find the successful one, when it's not a contrived benchmark where the desired answer is already known ahead of time? LLMs generally don't know when they've failed, they just barrel forwards and leave the user to filter out the junk responses.