Basically if you have the actual "factual" information, use it directly instead of hoping the LLM will accurately extract it and use it as part of a function call. In this case they already know what the accurate URLs are, just use it.
Where I currently work, our function calls regularly fail only to succeed flawlessly on a retry. (I believe we’re on the order of 10s of millions openai calls a day)
These are non-deterministic systems. I wouldn’t even trust them to accurately extract text until you did a beam search or something similar to kind of average out different LLM outputs
I touch on this from my own experience: https://youtu.be/cs5cbxDClbM?si=IQIFAD38cVzLCs55&t=486
Basically if you have the actual "factual" information, use it directly instead of hoping the LLM will accurately extract it and use it as part of a function call. In this case they already know what the accurate URLs are, just use it.