Oracle of Zotero: LLM QA of Your Research Library

dmezzetti · on Nov 26, 2023

Nice project!

I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.

paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.

paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).

While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.

panabee · on Nov 26, 2023

these are awesome projects. thanks for sharing.

it would accelerate research so much if LLM accuracy increased on biomedical papers.

very much agreed on the potential to extract signal from paper structures.

two questions if you don't mind:

1. did you post a summary of your chunking analysis somewhere? i'm curious which method maximized accuracy, and which sentence-overlap methods were most effective.

2. do you think general tokenization methods limit LLMs on scientific/biomedical papers?

dmezzetti · on Nov 26, 2023

Appreciate it!

> 1. did you post a summary of your chunking analysis somewhere? i'm curious which method maximized accuracy, and which sentence-overlap methods were most effective.

Good idea on this but nothing posted. In general, grouping by sections of a paper has worked best (i.e. methods, conclusions, results etc). GROBID is helpful with arbitrary papers.

> 2. do you think general tokenization methods limit LLMs on scientific/biomedical papers?

Possibly. For vectorization, specifically with medical, I do have this model (https://huggingface.co/NeuML/pubmedbert-base-embeddings) which is a fine-tuned sentence embeddings model using this base model (https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-u...). The base model does have a custom vocabulary.

In terms of LLMs, I've found that this model (https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) works well but haven't experimented with domain specific LLMs.

behnamoh · on Nov 26, 2023

The problem is that this is still just retrieval and mechanical. In RAG, you split a PDF into small chunks, but this is way different from how humans digest PDFs. If I hire an RA to go through my Zotero lib and make a mind-map of sorts, he/she would combine papers, paragraphs, figures, etc. to come up with a "concepts" map, which is way richer than a retrieval system that merely finds the semantic similarity between my query and pieces of text.

RAG is good for semantic search, but really we need something that works at a knowledge/understanding level as opposed to data/information level.

dmezzetti · on Nov 27, 2023

I think what you're looking for is possible with LLM agents. For paperai (mentioned previously) at least, it's possible to build workflows that connect multiple prompt steps together.

txtai (included with paperai) has the ability to build semantic graphs (https://neuml.hashnode.dev/introducing-the-semantic-graph).

I agree that RAG is just one part of the equation. But the tools are available if one wanted to build their own complex multi-agent workflow.

alchemist1e9 · on Nov 27, 2023

In the example of the RA being automated by an LLM workflow by agents, I agree it’s very possible and it requires defining a set of specific agents, using prompts and allow function calling for tools, and then defining a full workflow between the agents. The workflow can likely be modeled by breaking down the individual steps the RA takes when doing their work.

The agents are likely very narrow and specific, they do one very very specific task. Then the workflow is a DAG chaining their work together.

dartos · on Nov 27, 2023

There are lots of experiments around generating knowledge graph mutations and queries to build this kind of relational knowledge.

In neo4j, for example, relations tend to have natural language names. (The cat BELONGS_TO the human)

So LLMs appear to be apt at making those queries

jimwhite · on Nov 27, 2023

Why does this post link to a renamed fork of Paper-QA (https://github.com/whitead/paper-qa) which has made zero changes and is 19 commits behind the original?

numeri · on Nov 27, 2023

I spent far too long trying to figure that out as well. It's a much catchier name, for sure, but sort of silly that it has so many forks itself.

Loic · on Nov 27, 2023

Maybe a stupid question, but how are equations handled in the parsing of a paper? Are local runable LLM capable of proposing model equations like programming code? I have seen that GPT4 can, so just wondering if equations are "treated" like normal computer code. My Zotero papers are equations heavy.

Clueed · on Nov 27, 2023

I've looked into the available options of parsing PDFs, including pypdf, which is what is being used here, a while ago and it's not good. While I haven't testing equations specifically, it think it's fair so assume that the results will be subpar especially complex ones.

I guess, this could be an application of the agent model. I've seen multiple LLMs recently trained specifically on LateX parsing. One model would recognize from the parsed PDF garbage that there is probably an equation there and call a different want to parse it.

Loic · on Nov 27, 2023

Thank you for the idea to recognize the garbage to then use a different flow for the image of the equation from the pdf. Still left with an image to LaTeX problem, but maybe the state of the art has improved in the past years.

Alifatisk · on Nov 28, 2023

Their huggingface demo: https://huggingface.co/spaces/whitead/paper-qa

At the moment it returns runtime error.

Edit: It's because of missing open-api key, https://huggingface.co/spaces/whitead/paper-qa/blob/main/app...

dbcooper · on Nov 27, 2023

Thanks for sharing various projects. Any tools for materials science that can create summary tables of things like material, application, performance would be really valuable.

alchemist1e9 · on Nov 27, 2023

This is built on Langchain and I think it’s also possible to build this on top of Haystack now. I’m torn between the two and I’m wondering if this project provides a good example of why Langchain can be a better fit in certain situations, just not sure what those are exactly.

dmezzetti · on Nov 27, 2023

There are a lot of great options. This paper gives a comprehensive overview on the state of prompting frameworks: https://arxiv.org/abs/2311.12785

WanderPanda · on Nov 27, 2023

Oh no I'm just realizing that arxiv will be increasingly spammed with what should have been a blog post. I hope I'm wrong in assuming that in a few years the level of credibility that comes with a paper being on arxiv will have entirely worn off.

I know that in theory arxiv, being a pre-print server, shoulnd't give any credibility but practically that is the case and it still is a good quality/bs filter compared to e.g. Medium articles.

kergonath · on Nov 27, 2023

In my field, ArXiv has about the same level of credibility as Wikipedia or random journal articles from the International Journal of Sciency Science, i.e. trust, but verify. Among non-peer-reviewed documents, they rank below things like DoE or NASA reports and tend to not be cited.

There are preprints of articles since then published (which have the same credibility as the peer-reviewed article), articles form mates (which are obviously great), and the rest, which might be interesting but not a solid source on its own.

It seems to be working as intended, to be fair. ArXiv has precious little ways of improving the accuracy of the preprints.

qumpis · on Nov 27, 2023

From the glance of it, the paper looks very polished. Combine this with the fact that arxiv is invite-only, your prediction might not come about

alchemist1e9 · on Nov 27, 2023

The [1] git repo referenced in the paper is oddly very … basically empty? weird

https://github.com/lxx0628/Prompting-Framework-Survey

fl0id · on Nov 26, 2023

mmh, I was kind of hoping for something more finished ^^

syntaxers · on Nov 26, 2023

There's a tool I use called Petal https://www.petal.org/reference-manager. The free tier allows up to 1GB of PDFs, which I believe are processed by GROBID and chunked for LLM QA.

The feature I find most useful is the table automation which I use for literature review, since it lets me run the same QA prompts on a collection of documents all at once.

SubiculumCode · on Nov 27, 2023

I was too but I posted anyway. I'd like it built into a legitimate plugin inside the zotero app.

kergonath · on Nov 27, 2023

That would be fantastic. At the moment the barrier to entry to use this kind of models is quite high. Something that could be used from the GUI would be great.