In my experience, when presented with a failing test it would simply try to make the test pass instead of determining why the test was failing. Usually by hard coding the test parameters (or whatever) in the failing function... which was super annoying.
I once saw probably 10 iterations to fix a broken test, then it decided that we don't need this test at all, and it tried to just remove it.
IMO, you either write tests and let it write implementation or write implementation and let it write tests. Maybe use something to write tests, then forbid "implementor" to modify them.