Microbenchmarks are hard. You aren't doing any meaningful work that could benefit from speculatively executing instead of stalling for the conditional value.
Similarly you might be busting the pipeline by chaining together the jumps so close together.
Not saying your point is wrong, just saying your proof isn't super solid.
In this benchmark the only loop carried dependency is over the res variable (edit: and of course the index). The jump doesn't break these dependencies, so for this specific problem, the additional latency of the cmov doesn't matter as it is always perfectly pipelined and cmov will always come up on top. But if the input of cmov depended on a previous value, then potentially a branch could be better given an high enough prediciton rate.