I don't use Mistral 7B alone, this is just a component in a RAG-based system. A system that is 1) not clinical facing, 2) not used in clinical decision making, 3) provides in-line references sources for end users to validate information, and 4) is inherently human-in-the-loop.
> Why are you confident to use such a tiny model on something so critical?
Don't use any LLM for something critical. They can't be trusted innately due to their design, why on earth would you use them for something where you need reliability?
A key thing to remember in this specific moment in history is that the vast majority of people will be as lazy as they can get away with being. People want to leave work and go home and if an LLM lets them do that faster, who cares if it's accurate? It can absorb the blame.
Its summarisation, who cares if its right as long as you feel confident after reading it? /s.
In my experience, even GPT-4o is terrible at revealing information from things longer than a few pages.
It might be an issue with dimensionality reduction in general though. If you think about it, you can't really take away much of what is contained within any given amount of text with text, unless the source was produced extremely inefficiently.
Producing outlines or maybe a form of abstract, it seems to be okay at, but you would never really know where it fails unless you read the entirety of the source text first to begin with. IMO, its not worth risking unless you plan to read the source anyway or its not really important.