Hacker News new | past | comments | ask | show | jobs | submit login
Oracle of Zotero: LLM QA of Your Research Library (github.com/frost-group)
172 points by SubiculumCode on Nov 26, 2023 | hide | past | favorite | 24 comments



Nice project!

I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.

paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.

paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).

While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.


these are awesome projects. thanks for sharing.

it would accelerate research so much if LLM accuracy increased on biomedical papers.

very much agreed on the potential to extract signal from paper structures.

two questions if you don't mind:

1. did you post a summary of your chunking analysis somewhere? i'm curious which method maximized accuracy, and which sentence-overlap methods were most effective.

2. do you think general tokenization methods limit LLMs on scientific/biomedical papers?


Appreciate it!

> 1. did you post a summary of your chunking analysis somewhere? i'm curious which method maximized accuracy, and which sentence-overlap methods were most effective.

Good idea on this but nothing posted. In general, grouping by sections of a paper has worked best (i.e. methods, conclusions, results etc). GROBID is helpful with arbitrary papers.

> 2. do you think general tokenization methods limit LLMs on scientific/biomedical papers?

Possibly. For vectorization, specifically with medical, I do have this model (https://huggingface.co/NeuML/pubmedbert-base-embeddings) which is a fine-tuned sentence embeddings model using this base model (https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-u...). The base model does have a custom vocabulary.

In terms of LLMs, I've found that this model (https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) works well but haven't experimented with domain specific LLMs.


The problem is that this is still just retrieval and mechanical. In RAG, you split a PDF into small chunks, but this is way different from how humans digest PDFs. If I hire an RA to go through my Zotero lib and make a mind-map of sorts, he/she would combine papers, paragraphs, figures, etc. to come up with a "concepts" map, which is way richer than a retrieval system that merely finds the semantic similarity between my query and pieces of text.

RAG is good for semantic search, but really we need something that works at a knowledge/understanding level as opposed to data/information level.


I think what you're looking for is possible with LLM agents. For paperai (mentioned previously) at least, it's possible to build workflows that connect multiple prompt steps together.

txtai (included with paperai) has the ability to build semantic graphs (https://neuml.hashnode.dev/introducing-the-semantic-graph).

I agree that RAG is just one part of the equation. But the tools are available if one wanted to build their own complex multi-agent workflow.


In the example of the RA being automated by an LLM workflow by agents, I agree it’s very possible and it requires defining a set of specific agents, using prompts and allow function calling for tools, and then defining a full workflow between the agents. The workflow can likely be modeled by breaking down the individual steps the RA takes when doing their work.

The agents are likely very narrow and specific, they do one very very specific task. Then the workflow is a DAG chaining their work together.


There are lots of experiments around generating knowledge graph mutations and queries to build this kind of relational knowledge.

In neo4j, for example, relations tend to have natural language names. (The cat BELONGS_TO the human)

So LLMs appear to be apt at making those queries


Why does this post link to a renamed fork of Paper-QA (https://github.com/whitead/paper-qa) which has made zero changes and is 19 commits behind the original?


I spent far too long trying to figure that out as well. It's a much catchier name, for sure, but sort of silly that it has so many forks itself.


Maybe a stupid question, but how are equations handled in the parsing of a paper? Are local runable LLM capable of proposing model equations like programming code? I have seen that GPT4 can, so just wondering if equations are "treated" like normal computer code. My Zotero papers are equations heavy.


I've looked into the available options of parsing PDFs, including pypdf, which is what is being used here, a while ago and it's not good. While I haven't testing equations specifically, it think it's fair so assume that the results will be subpar especially complex ones.

I guess, this could be an application of the agent model. I've seen multiple LLMs recently trained specifically on LateX parsing. One model would recognize from the parsed PDF garbage that there is probably an equation there and call a different want to parse it.


Thank you for the idea to recognize the garbage to then use a different flow for the image of the equation from the pdf. Still left with an image to LaTeX problem, but maybe the state of the art has improved in the past years.


Their huggingface demo: https://huggingface.co/spaces/whitead/paper-qa

At the moment it returns runtime error.

Edit: It's because of missing open-api key, https://huggingface.co/spaces/whitead/paper-qa/blob/main/app...


Thanks for sharing various projects. Any tools for materials science that can create summary tables of things like material, application, performance would be really valuable.


This is built on Langchain and I think it’s also possible to build this on top of Haystack now. I’m torn between the two and I’m wondering if this project provides a good example of why Langchain can be a better fit in certain situations, just not sure what those are exactly.


There are a lot of great options. This paper gives a comprehensive overview on the state of prompting frameworks: https://arxiv.org/abs/2311.12785


Oh no I'm just realizing that arxiv will be increasingly spammed with what should have been a blog post. I hope I'm wrong in assuming that in a few years the level of credibility that comes with a paper being on arxiv will have entirely worn off.

I know that in theory arxiv, being a pre-print server, shoulnd't give any credibility but practically that is the case and it still is a good quality/bs filter compared to e.g. Medium articles.


In my field, ArXiv has about the same level of credibility as Wikipedia or random journal articles from the International Journal of Sciency Science, i.e. trust, but verify. Among non-peer-reviewed documents, they rank below things like DoE or NASA reports and tend to not be cited.

There are preprints of articles since then published (which have the same credibility as the peer-reviewed article), articles form mates (which are obviously great), and the rest, which might be interesting but not a solid source on its own.

It seems to be working as intended, to be fair. ArXiv has precious little ways of improving the accuracy of the preprints.


From the glance of it, the paper looks very polished. Combine this with the fact that arxiv is invite-only, your prediction might not come about


The [1] git repo referenced in the paper is oddly very … basically empty? weird

https://github.com/lxx0628/Prompting-Framework-Survey


mmh, I was kind of hoping for something more finished ^^


There's a tool I use called Petal https://www.petal.org/reference-manager. The free tier allows up to 1GB of PDFs, which I believe are processed by GROBID and chunked for LLM QA.

The feature I find most useful is the table automation which I use for literature review, since it lets me run the same QA prompts on a collection of documents all at once.


I was too but I posted anyway. I'd like it built into a legitimate plugin inside the zotero app.


That would be fantastic. At the moment the barrier to entry to use this kind of models is quite high. Something that could be used from the GUI would be great.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: