Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Loki: An open-source tool for fact verification (github.com/libr-ai)
238 points by Xudong on April 6, 2024 | hide | past | favorite | 68 comments



Overall great idea though, I'll be definitely checking it back in the future. A few things that hit me out of the box:

* The idea behind using Serper is great, however it would be cool if other search engines/data sources can be used instead, ie. Kagi or some private search engine/data. Reason for the latter: there are tons of people who are sourcing all sorts of information which will not immediately show up on google and some might never do. For context: I have roughly 60GB (and growing) of cleaned news article with where I got them from and with a good amount of pre-processing done on the fly(I collect those all the time).

* Relying heavily on OpenAI. Yes, OpenAI is great but there's always the thing at the back of our minds that is "where are all those queries going and do we trust that shit won't hit the fan some day". It would be nice to have the ability to use a local LLM, given how many and how good there are around.

* The installation can be improved massively: setuptools + entry_points + console_scripts to avoid all the hassle behind having to manage dependencies, where your scripts are located and all that. The cp factcheck/config/secret_dict.template factcheck/config/secret_dict.py is a bit.... Uuuugh... pydantic[dotenv] + .env? That would also make the containerizing the application so much easier.


Thank you for your suggestions, axegon!!! We will definitely consider them and add the features in a future version shortly.

Regarding the first version, we are currently working on enabling customized evidence retrieval, including local files. Our plan is to integrate existing tools like LlamaIndex. Any suggestion is greatly appreciated!

Regarding the second point, we have found OpenAI's JSON mode to be greatly helpful, and have optimized our prompts to fully utilize these advances. However, we agree that it would be beneficial to enable the use of other models. As promised, we will add this feature soon.

Lastly, we appreciate your suggestion and will work on improving the installation process for the next version.


Dead internet.


Have to agree with you, every comment from the product creator reads like a chatGPT response.


To me it sounds like someone who speaks English as a second language, writing well and clearly and in a formal style. It's just unlucky for them that that's the style GPT is so good at too.

I'm a native English speaker, and traditionally when it comes to formal/professional written English (emails etc.) my instincts take me to sounding quite GPTish - luckily I've got a good grasp of the language and have found it fairly easy to alter my formal writing style to be a bit less traditional and a bit less formal too, but if it wasn't my first language and I wasn't a fair bit above average in writing ability even for native speakers, I suspect it wouldn't be nearly as easy to go against how I was taught at school to write in formal situations.

It's really not enough to see that somebody writes roughly in that style to assume they're using LLMs, because the reason LLMs so often sound like that is because they've learned from humans very often sounding like that.

In an example such as this particular case, it maybe set off your LLM suspicions because culturally you wouldn't expect somebody to sound so formal in comments on a site like HN, and choosing the wrong tone of voice for the context is something an LLM is likely to do - but actually, if a) English isn't your first language nor part of your primary culture, and b) you're wanting to make a good impression as the subject of the thread is something you've created and are therefore essentially acting as a spokesperson for in the comments, then all of a sudden writing formally rather than as if writing throwaway forum comments makes sense rather than looking like an indication that AI wrote it.


+1 reads like a non native speaker writing very polite and formal prose to a customer. ChatGPT has a very peculiar way of speaking that belies a psychotic mind plotting your enslavement in a global labeling farm.


I wouldn't say it's formal. It's the overly optimistic tone and 100% coverage of the parent post. The fact we can't tell for sure emphasizes my point.


I will take it as a compliment, lol. But I do hope ChatGPT or some agents could help me with this. Btw, our recent study on machine-generated text detection might be interesting to you.

https://arxiv.org/abs/2305.14902 https://arxiv.org/abs/2402.11175


I fully expect some sort of enshittification of openai at some point.


That's assuming it's not done already with their mission of being open completely forgotten


Feedback on the example gif: at the moment it's almost comically useless. First you're bored watching the beginning 90% while commands are slowly being typed, and then the bit that's actually interesting and worth reading scrolls too fast and then resets to the beginning of the gif before there's a chance to read it.


Thanks for your feedback on the gif figure, swores! We will revise it soon.


mpv ftw: playback speed control even for gifs..


Maybe the name is not so fitting as Loki is a name in Norse Mythology. Known for deceiving and lying which is basically the opposite you're trying to do :)


It's also the name of a well-known open-source log collection system that's part of the LGTM stack (predominantly led by GrafanaCloud Labs.)


Don't want to be that guy, but I still don't get it :) You want your logs to lie? :D


You definitely want to be that guy ;) (but I thought it was funny the first time)


Maybe it's on purpose.

Who could better know the patterns of liars than the god of lying.


When coming up with a name for something, the significance of the name should be the first thing that comes to mind, not the opposite thing or a thing that requires explanation. When people say the name of your tool, you won’t always be there to explain it.


Discord comes to mine although I know it's a play on Eris (Free Net), same as Slack is a nod to Uncle Bob


>The name Discord was chosen because it "sounds cool and has to do with talking", was easy to say, spell, remember, and was available for trademark and website. In addition, "Discord in the gaming community" was the problem they wished to solve.


I didn't get this, can you tell me more?


Exactly, it needs to be an "Ah, of course" moment from the first second


Apollo, Veritas, or Aletheia/Alethia.


Reverse Psychology, I like it but still makes you wonder no?


So only thing they open sourced is the prompts [1] and code to call LLM APIs? There are plenty of such libraries out there. And the prompts seem to be copied from here [2]?

[1] https://github.com/Libr-AI/OpenFactVerification/blob/main/fa...

[2] https://github.com/yuxiaw/Factcheck-GPT/blob/main/src/utils/...


regarding your last concern, I found that yuxiaw is their COO[1], so it can't be considered a copy?

[1] https://www.librai.tech/team


Ok but bigger issue is there is evidence that the LLMs are not better than specialized models for fact-checking. https://arxiv.org/abs/2402.12147


Hello vinni2, thank you for mentioning the paper. However, I noticed that it hasn't gone through peer review yet. Also, the paper suggests that fine-tuning may work better than in-context learning, but that's not a problem. You can fine-tune any LLMs like GPT-3.5 for this purpose and use them with this framework. Once you have fine-tuned GPT, for example, with specific data, you'll only need to modify the model name (https://github.com/Libr-AI/OpenFactVerification/blob/8fd1da9...). I believe this approach can lead to better results than what the paper suggests.


It’s a bit misleading to call it open source tool when it relies on proprietary LLMs for everything.


Presumably the LLMs are swappable -- today the proprietary ones are very powerful and accessible, but the landscape may yet change.


Well but they don’t mention that it is clickbait to call it open source fact checking tool which needs LLMs to do everything. Also code is not designed to easily swap with a free locally running LLM.


I apologize for any confusion caused earlier. The core components have been defined separately (https://github.com/Libr-AI/OpenFactVerification/tree/main/fa...) to make customization easier. We understand that switching between different LLMs isn't particularly easy in the current version. However, we will be adding these features in future versions. You are most welcome to collaborate with us and contribute to this project!


Isn't this similar to the Deepmind paper on long form factuality posted a few days ago?

https://arxiv.org/abs/2403.18802

https://github.com/google-deepmind/long-form-factuality/tree...


Yes, they are similar. Actually, our initial paper was presented around five months ago (https://arxiv.org/abs/2311.09000). Unfortunately, our paper isn't cited by the DeepMind paper, which you may see this discussion as an example: https://x.com/gregd_nlp/status/1773453723655696431

Compared with our initial version, we have mainly focused on its efficiency, with a 10X faster checking process without decreasing accuracy.


> We further construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document

A 2020 Meta paper [1] mentions FEVER [2], which was published in 2018.

[1] "Language models as fact checkers?" (2020) https://scholar.google.com/scholar?cites=3466959631133385664

[2] https://paperswithcode.com/dataset/fever

I've collected various ideas for publishing premises as linked data; "#StructuredPremises" "#nbmeta" https://www.google.com/search?q=%22structuredpremises%22

From "GenAI and erroneous medical references" https://news.ycombinator.com/item?id=39497333 :

>> Additional layers of these 'LLMs' could read the responses and determine whether their premises are valid and their logic is sound as necessary to support the presented conclusion(s), and then just suggest a different citation URL for the preceding text

> [...] "Find tests for this code"

> "Find citations for this bias"

From https://news.ycombinator.com/item?id=38353285 :

> "LLMs cannot find reasoning errors, but can correct them" https://news.ycombinator.com/item?id=38353285

> "Misalignment and [...]"


> This tool is especially useful for journalists, researchers, and anyone interested in the factuality of information.

Sorry, I think an individual who is not only aware of reliable sources to verify information, and who is not familiar enough with LLMs to come up with appropriate prompts and judge output should be the last person presenting themselves as the judger of factual information.


Thanks for your response. When discussing fact-checking capabilities, the key question is always: Can we guarantee that it will always offer the correct justification? While it's unfortunate, errors can occur. Nonetheless, we prioritize making the checking process both interpretable and transparent, allowing users to understand and trust the rationale behind each assessment.

We present the results at each step to help users understand the decision process, which can be seen from our screenshot at https://raw.githubusercontent.com/Libr-AI/OpenFactVerificati...

We will try our best to ensure this tool makes a positive difference


Very cool! I’ve toyed with an idea like this for a while. The scraping is a cool extra feature, but tbh just breaking down text into verifiable claims and setting up the logic tokens is way cooler.

I imagine somebody feeding a live presidential debate into this. Could be a great tool for fact checking


ahah thanks!


That seems like something unlikely to do well at being automated, and not that at least current-gen ai is capable of.

Does it...work?


Hi there, I agree that fact-checking is not something that current generative AI models can directly solve. Therefore, we decompose this complex into five simpler steps, which current techniques can better solve. Please refer to https://github.com/Libr-AI/OpenFactVerification?tab=readme-o... for more details.

However, errors can always occur. We try to help users in an interpretable and transparent way by showing all retrieved evidence and the rationale behind each assessment. We hope this could at least help people when dealing with such problems.


I just tried similar queries as they show on their screenshots with Kagi. Basically asked it the exact same question.

While it answered a general "yes" when the more precise answer was "no", the motivation in the answer was perfectly on point and exactly the same things.

As a general LLM for regular user fastGPT (their llm service) is in my opinion "meh" (lacks conversations for instance). But it's really impressive that it contains VERY recent data (like news and articles from last few days) and always provides great references.


I have a project where I take a different approach [0] . I basically extract statements , explicit or implicit , that should be accompanied by a reference to some data but aren't and I let user find the most relevant data for those statements.

[0] https://datum.alwaysdata.net/


The main problem / drawback of LLM’s is their propensity to hallucinate (or lie, said plainly) and thus is a significant issue even for ChatGPT 4.

Intuitively, just because you put your LLM into a workflow/pipeline this doesn’t really address how to eliminate hallucinations.

For those of us that don’t follow the research closely, can you explain how your findings and approach allows you to utilise LLMS and work around this hard limitation. Said another way, how are you getting around the fact that LLMs themselves regularly output lies/false answers?


You might want to look into integrating DebateSum or OpenDebateEvidence (OpenCaseList) into this tool as sources of evidence. They are uniquely good for these sorts of tasks:

https://huggingface.co/datasets/Hellisotherpeople/DebateSum

https://huggingface.co/datasets/Yusuf5/OpenCaselist


Hi Der_Einzige, thanks for pointing out these two great datasets! We are currently working on including customized evidence sources internally and will definitely consider these two datasets in the future version of this open-source project.



IMVHO people do not need "automated fact verification" as a source of trust we can't trust, but summarizers: most FLOSS users and not so few computer users in the largest sense do use feeds, but they got many posts per day and some days they like to read them all, some others they are busy. Tools to skim news and offer a sort of index to decide what to see, a kind of smart scoring is much more interesting.


Agree. "Fact-checking" can never be more than assertions of a particular bias. I am surprised that this project has received so few critical comments along these lines here.

The idea that "specificity," such as what scientific research aims for, can be better evaluated for truthfulness or approach what "truly matters," as this project purports, is dubious. E.g., why would a notion that is more limited in scope matter more than something more vast (to use the word that it cites as an example)? In addition to its dystopian idea of a "source of truth," it completely dismisses "vague" language in the name of "science" or "factuality," which is utterly the opposite of science, which I thought was to understand ourselves and nature with as few presuppositions as possible.


When I saw Loki as the name, I instantly thought of Grafana Loki for logging. I click on the GitHub and get Libr-AI and OpenFactVerification.

I am not commenting on the actual software and I know names are hard and often overlap, but with something as popular as Loki already used for logging I think it might get confusing.


Hi siffland! Thank you for your feedback. We understand your concern about the potential confusion given the popularity of Grafana Loki in the logging space. When naming our project, we sought a name that encapsulates our goal of combating misinformation. We chose Loki, inspired by the Norse god often associated with stories and trickery, to symbolize our commitment to unveiling the truth hidden within nonfactual information.

When we named our project, we were unaware of the overlap with Grafana Loki. We appreciate you bringing this to our attention! I will discuss this issue with my team in the next meeting, and figure out if there is a better way of solving this. If you have any suggestions or thoughts on how we can better differentiate our project, we would love to hear them.

Thank you again for your valuable input!


How is information qualified as evidence (e.g., the “Evidence Crawler” functionality)?

The best case scenario would seem to be that results are derived from certain biases built into the model, unless it weighs “factuality” by the number of occurrences of certain statements on the internet which is as far from a qualification for truthfulness as the biased model.


The last time I looked we can't even parse and build a semantic model for anything more than simple sentences to build a coherent representation of their meaning. Which tells me this is just some glorified sort of fuzzy matching algorithm.


I found it very interesting. I had this funny thought that just like CAPTCHA, may be soon we will have to ask humans to give their input on fact verification systems at scale.


Interesting. In the Nordics, we have a couple of sites dedicated to fact checking news stories, done by real people. I think these kinds of automated tools can be helpful too, but needs to be tied to reliable sources. This became pretty apparent to me with the tech news coverage of xz, too. Lots of accidental (or sometimes intentional?) misinformation being spread in news articles. I wrote about it a bit[0], it was pretty sad to see big international publishers publishing an article based entirely on the journalist's misunderstandings of the situation. Facts and truth is important, especially as we see gen AI furthering the amount of legitimate looking content online that might not actually be true.

[0] - https://open.substack.com/pub/thetechenabler/p/trust-in-brea...


> In the Nordics, we have a couple of sites dedicated to fact checking news stories, done by real people.

We have it everywhere. The problem is however well-known: Human bias, political engagement from the fact checkers, etc.. AI (without any kind of lock, political bias built-in etc) could be the real deal, but because it may be not political correct, it will never happen.


I wholeheartedly agree on the necessity of linking fact-checking tools to credible sources. Currently, our team's expertise lies primarily in AI, and we find ourselves at a disadvantage when it comes to pinpointing authoritative sources. Acknowledging the challenges posed by the rapid spread of misinformation, as highlighted by recent studies, we developed this prototype to assist in information verification. We recognize the value of collaboration in enhancing our tool's effectiveness and invite those experienced in evaluating sources to join our effort. If our project interests you and you're willing to contribute, please don't hesitate to reach out. We're eager to collaborate and make a positive impact together.


How does AI observe facts in real world? I find hilarious clarify something as fact checkig based on data on internet.


How does anyone who was not there? The answer should be similar


How does AI "be" somewhere?


Great idea. However, I wouldn't trust it's results since it's heavily relying on LLMs and crawling the web. That means "facts" are whatever is the most popular opinion in the Internet. At times where we get more and more enshittification, you'll probably get your "facts" from LLMs generated SEO websites.

I think the only proper way to verify facts is to derive them from "fundamental facts". E.g., that the earth is round (and even for that there are ppl believing the opposite).


garbage, require an invitation might as well be non existent "Open Source"


Horrible AI-generated imagery especially with all the AI garbled words.


The name Loki is such a great fit! WOW!

This is some giant BS that is for sure. Some stupid, literally brain-dead AI searching things created by humans to determine what is a "fact". This is beyond dystopian crap.

We all know all the fact-checker orgs. used by big tech like Facebook and others are filled with hyper biased woke people who do not actually fact-check things but get off on having the power to enforce their beliefs, feelings and biases.

I can already tell this is total BS without even looking into it, what kinds of sources will it use? What ranking will they give them? Snopes? ROFL. Probably just uses some woke infested, censored and curated language model to determine a fact based on what has the most matches or THE MOST LIKELY because that how AI works. Has absolutely nothing to do with facts.

And it's even worse, we are literally in a time when AI hallucinates things that do not exist. I won't use a stupid AI to find me "facts".


How can I get an invitation code?


[flagged]


hahhah


My friend’s startup: https://factiverse.ai/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: