This is a strange example. A LLM would write a script for you to find the amount of "Wizard" being mentioned. That is what it always doing when it comes to numbers because it knows that it is its weakness.
Edit: Going from counting numbers to finding relevant information at different pages given a task isn't a valid analogy.
Agents are mentioned but not multi-agent architectures [0], where you could have an agent responsible for insurance policies, legal definitions, and/or a bot that is responsible for the big picture in question. They would go back and forth, being expert at their field (or task) and come to a conclusion after some iterations of API calls.
One of the big issues in this space seems yo be the wishful thinking. Have you tried to use Autogen to solve any of the hard questions mentioned in the article?
I had a good go at a specific domain, witb a corpus of only a dozen thousand pages, and I agree with the article: your agents need to build a solid ontology, and if you found a way to get Autogen to do that, then please please tell us.
Please see my comment below, and the "Why should I care" section of the post. Yes you can count the number of times the word "wizard" is mentioned, but for tasks that aren't quite as cut-and-dry (say, listing out all of the core arguments of a 100-page legal case), you cannot just write a Python script.
The agentic approach falls apart because again, a self-querying mechanism or a multi-agent framework still needs to know where in the document to look for each subset of information. That's why I argue that you need an ontology. And at that point, agents are moot. A small 7b model with a simple prompt suffices, without any of the unreliability of agents. I suggest trying agents on an actually serious document, the problems are pretty evident. That said, I do hope that they get there one day because it will be cool.
LLMs see tokens not words and counting is a problem for them, high context or no.
Maybe the current state of the art LLM can't solve the kind of high value long context problems you have in mind but what I can tell you though is that you won't find that out by asking it to count.
Edit: Going from counting numbers to finding relevant information at different pages given a task isn't a valid analogy.
Agents are mentioned but not multi-agent architectures [0], where you could have an agent responsible for insurance policies, legal definitions, and/or a bot that is responsible for the big picture in question. They would go back and forth, being expert at their field (or task) and come to a conclusion after some iterations of API calls.
Missed opportunity.
[0] https://microsoft.github.io/autogen/docs/Use-Cases/agent_cha...