Overall great idea though, I'll be definitely checking it back in the future. A few things that hit me out of the box:
* The idea behind using Serper is great, however it would be cool if other search engines/data sources can be used instead, ie. Kagi or some private search engine/data. Reason for the latter: there are tons of people who are sourcing all sorts of information which will not immediately show up on google and some might never do. For context: I have roughly 60GB (and growing) of cleaned news article with where I got them from and with a good amount of pre-processing done on the fly(I collect those all the time).
* Relying heavily on OpenAI. Yes, OpenAI is great but there's always the thing at the back of our minds that is "where are all those queries going and do we trust that shit won't hit the fan some day". It would be nice to have the ability to use a local LLM, given how many and how good there are around.
* The installation can be improved massively: setuptools + entry_points + console_scripts to avoid all the hassle behind having to manage dependencies, where your scripts are located and all that. The cp factcheck/config/secret_dict.template factcheck/config/secret_dict.py is a bit.... Uuuugh... pydantic[dotenv] + .env? That would also make the containerizing the application so much easier.
Thank you for your suggestions, axegon!!! We will definitely consider them and add the features in a future version shortly.
Regarding the first version, we are currently working on enabling customized evidence retrieval, including local files. Our plan is to integrate existing tools like LlamaIndex. Any suggestion is greatly appreciated!
Regarding the second point, we have found OpenAI's JSON mode to be greatly helpful, and have optimized our prompts to fully utilize these advances. However, we agree that it would be beneficial to enable the use of other models. As promised, we will add this feature soon.
Lastly, we appreciate your suggestion and will work on improving the installation process for the next version.
To me it sounds like someone who speaks English as a second language, writing well and clearly and in a formal style. It's just unlucky for them that that's the style GPT is so good at too.
I'm a native English speaker, and traditionally when it comes to formal/professional written English (emails etc.) my instincts take me to sounding quite GPTish - luckily I've got a good grasp of the language and have found it fairly easy to alter my formal writing style to be a bit less traditional and a bit less formal too, but if it wasn't my first language and I wasn't a fair bit above average in writing ability even for native speakers, I suspect it wouldn't be nearly as easy to go against how I was taught at school to write in formal situations.
It's really not enough to see that somebody writes roughly in that style to assume they're using LLMs, because the reason LLMs so often sound like that is because they've learned from humans very often sounding like that.
In an example such as this particular case, it maybe set off your LLM suspicions because culturally you wouldn't expect somebody to sound so formal in comments on a site like HN, and choosing the wrong tone of voice for the context is something an LLM is likely to do - but actually, if a) English isn't your first language nor part of your primary culture, and b) you're wanting to make a good impression as the subject of the thread is something you've created and are therefore essentially acting as a spokesperson for in the comments, then all of a sudden writing formally rather than as if writing throwaway forum comments makes sense rather than looking like an indication that AI wrote it.
+1 reads like a non native speaker writing very polite and formal prose to a customer. ChatGPT has a very peculiar way of speaking that belies a psychotic mind plotting your enslavement in a global labeling farm.
I will take it as a compliment, lol. But I do hope ChatGPT or some agents could help me with this. Btw, our recent study on machine-generated text detection might be interesting to you.
Feedback on the example gif: at the moment it's almost comically useless. First you're bored watching the beginning 90% while commands are slowly being typed, and then the bit that's actually interesting and worth reading scrolls too fast and then resets to the beginning of the gif before there's a chance to read it.
Maybe the name is not so fitting as Loki is a name in Norse Mythology. Known for deceiving and lying which is basically the opposite you're trying to do :)
When coming up with a name for something, the significance of the name should be the first thing that comes to mind, not the opposite thing or a thing that requires explanation. When people say the name of your tool, you won’t always be there to explain it.
>The name Discord was chosen because it "sounds cool and has to do with talking", was easy to say, spell, remember, and was available for trademark and website. In addition, "Discord in the gaming community" was the problem they wished to solve.
So only thing they open sourced is the prompts [1] and code to call LLM APIs? There are plenty of such libraries out there. And the prompts seem to be copied from here [2]?
Hello vinni2, thank you for mentioning the paper. However, I noticed that it hasn't gone through peer review yet. Also, the paper suggests that fine-tuning may work better than in-context learning, but that's not a problem. You can fine-tune any LLMs like GPT-3.5 for this purpose and use them with this framework. Once you have fine-tuned GPT, for example, with specific data, you'll only need to modify the model name (https://github.com/Libr-AI/OpenFactVerification/blob/8fd1da9...). I believe this approach can lead to better results than what the paper suggests.
Well but they don’t mention that it is clickbait to call it open source fact checking tool which needs LLMs to do everything. Also code is not designed to easily swap with a free locally running LLM.
I apologize for any confusion caused earlier. The core components have been defined separately (https://github.com/Libr-AI/OpenFactVerification/tree/main/fa...) to make customization easier. We understand that switching between different LLMs isn't particularly easy in the current version. However, we will be adding these features in future versions. You are most welcome to collaborate with us and contribute to this project!
>> Additional layers of these 'LLMs' could read the responses and determine whether their premises are valid and their logic is sound as necessary to support the presented conclusion(s), and then just suggest a different citation URL for the preceding text
> This tool is especially useful for journalists, researchers, and anyone interested in the factuality of information.
Sorry, I think an individual who is not only aware of reliable sources to verify information, and who is not familiar enough with LLMs to come up with appropriate prompts and judge output should be the last person presenting themselves as the judger of factual information.
Thanks for your response. When discussing fact-checking capabilities, the key question is always: Can we guarantee that it will always offer the correct justification? While it's unfortunate, errors can occur. Nonetheless, we prioritize making the checking process both interpretable and transparent, allowing users to understand and trust the rationale behind each assessment.
Very cool! I’ve toyed with an idea like this for a while. The scraping is a cool extra feature, but tbh just breaking down text into verifiable claims and setting up the logic tokens is way cooler.
I imagine somebody feeding a live presidential debate into this. Could be a great tool for fact checking
Hi there, I agree that fact-checking is not something that current generative AI models can directly solve. Therefore, we decompose this complex into five simpler steps, which current techniques can better solve. Please refer to https://github.com/Libr-AI/OpenFactVerification?tab=readme-o... for more details.
However, errors can always occur. We try to help users in an interpretable and transparent way by showing all retrieved evidence and the rationale behind each assessment. We hope this could at least help people when dealing with such problems.
I just tried similar queries as they show on their screenshots with Kagi. Basically asked it the exact same question.
While it answered a general "yes" when the more precise answer was "no", the motivation in the answer was perfectly on point and exactly the same things.
As a general LLM for regular user fastGPT (their llm service) is in my opinion "meh" (lacks conversations for instance). But it's really impressive that it contains VERY recent data (like news and articles from last few days) and always provides great references.
I have a project where I take a different approach [0] . I basically extract statements , explicit or implicit , that should be accompanied by a reference to some data but aren't and I let user find the most relevant data for those statements.
The main problem / drawback of LLM’s is their propensity to hallucinate (or lie, said plainly) and thus is a significant issue even for ChatGPT 4.
Intuitively, just because you put your LLM into a workflow/pipeline this doesn’t really address how to eliminate hallucinations.
For those of us that don’t follow the research closely, can you explain how your findings and approach allows you to utilise LLMS and work around this hard limitation. Said another way, how are you getting around the fact that LLMs themselves regularly output lies/false answers?
You might want to look into integrating DebateSum or OpenDebateEvidence (OpenCaseList) into this tool as sources of evidence. They are uniquely good for these sorts of tasks:
Hi Der_Einzige, thanks for pointing out these two great datasets! We are currently working on including customized evidence sources internally and will definitely consider these two datasets in the future version of this open-source project.
IMVHO people do not need "automated fact verification" as a source of trust we can't trust, but summarizers: most FLOSS users and not so few computer users in the largest sense do use feeds, but they got many posts per day and some days they like to read them all, some others they are busy. Tools to skim news and offer a sort of index to decide what to see, a kind of smart scoring is much more interesting.
Agree. "Fact-checking" can never be more than assertions of a particular bias. I am surprised that this project has received so few critical comments along these lines here.
The idea that "specificity," such as what scientific research aims for, can be better evaluated for truthfulness or approach what "truly matters," as this project purports, is dubious. E.g., why would a notion that is more limited in scope matter more than something more vast (to use the word that it cites as an example)? In addition to its dystopian idea of a "source of truth," it completely dismisses "vague" language in the name of "science" or "factuality," which is utterly the opposite of science, which I thought was to understand ourselves and nature with as few presuppositions as possible.
When I saw Loki as the name, I instantly thought of Grafana Loki for logging. I click on the GitHub and get Libr-AI and OpenFactVerification.
I am not commenting on the actual software and I know names are hard and often overlap, but with something as popular as Loki already used for logging I think it might get confusing.
Hi siffland! Thank you for your feedback. We understand your concern about the potential confusion given the popularity of Grafana Loki in the logging space. When naming our project, we sought a name that encapsulates our goal of combating misinformation. We chose Loki, inspired by the Norse god often associated with stories and trickery, to symbolize our commitment to unveiling the truth hidden within nonfactual information.
When we named our project, we were unaware of the overlap with Grafana Loki. We appreciate you bringing this to our attention! I will discuss this issue with my team in the next meeting, and figure out if there is a better way of solving this. If you have any suggestions or thoughts on how we can better differentiate our project, we would love to hear them.
How is information qualified as evidence (e.g., the “Evidence Crawler” functionality)?
The best case scenario would seem to be that results are derived from certain biases built into the model, unless it weighs “factuality” by the number of occurrences of certain statements on the internet which is as far from a qualification for truthfulness as the biased model.
The last time I looked we can't even parse and build a semantic model for anything more than simple sentences to build a coherent representation of their meaning. Which tells me this is just some glorified sort of fuzzy matching algorithm.
I found it very interesting. I had this funny thought that just like CAPTCHA, may be soon we will have to ask humans to give their input on fact verification systems at scale.
Interesting. In the Nordics, we have a couple of sites dedicated to fact checking news stories, done by real people. I think these kinds of automated tools can be helpful too, but needs to be tied to reliable sources. This became pretty apparent to me with the tech news coverage of xz, too. Lots of accidental (or sometimes intentional?) misinformation being spread in news articles. I wrote about it a bit[0], it was pretty sad to see big international publishers publishing an article based entirely on the journalist's misunderstandings of the situation. Facts and truth is important, especially as we see gen AI furthering the amount of legitimate looking content online that might not actually be true.
> In the Nordics, we have a couple of sites dedicated to fact checking news stories, done by real people.
We have it everywhere. The problem is however well-known: Human bias, political engagement from the fact checkers, etc.. AI (without any kind of lock, political bias built-in etc) could be the real deal, but because it may be not political correct, it will never happen.
I wholeheartedly agree on the necessity of linking fact-checking tools to credible sources. Currently, our team's expertise lies primarily in AI, and we find ourselves at a disadvantage when it comes to pinpointing authoritative sources. Acknowledging the challenges posed by the rapid spread of misinformation, as highlighted by recent studies, we developed this prototype to assist in information verification. We recognize the value of collaboration in enhancing our tool's effectiveness and invite those experienced in evaluating sources to join our effort. If our project interests you and you're willing to contribute, please don't hesitate to reach out. We're eager to collaborate and make a positive impact together.
Great idea. However, I wouldn't trust it's results since it's heavily relying on LLMs and crawling the web.
That means "facts" are whatever is the most popular opinion in the Internet. At times where we get more and more enshittification, you'll probably get your "facts" from LLMs generated SEO websites.
I think the only proper way to verify facts is to derive them from "fundamental facts". E.g., that the earth is round (and even for that there are ppl believing the opposite).
This is some giant BS that is for sure. Some stupid, literally brain-dead AI searching things created by humans to determine what is a "fact". This is beyond dystopian crap.
We all know all the fact-checker orgs. used by big tech like Facebook and others are filled with hyper biased woke people who do not actually fact-check things but get off on having the power to enforce their beliefs, feelings and biases.
I can already tell this is total BS without even looking into it, what kinds of sources will it use? What ranking will they give them? Snopes? ROFL. Probably just uses some woke infested, censored and curated language model to determine a fact based on what has the most matches or THE MOST LIKELY because that how AI works. Has absolutely nothing to do with facts.
And it's even worse, we are literally in a time when AI hallucinates things that do not exist. I won't use a stupid AI to find me "facts".
* The idea behind using Serper is great, however it would be cool if other search engines/data sources can be used instead, ie. Kagi or some private search engine/data. Reason for the latter: there are tons of people who are sourcing all sorts of information which will not immediately show up on google and some might never do. For context: I have roughly 60GB (and growing) of cleaned news article with where I got them from and with a good amount of pre-processing done on the fly(I collect those all the time).
* Relying heavily on OpenAI. Yes, OpenAI is great but there's always the thing at the back of our minds that is "where are all those queries going and do we trust that shit won't hit the fan some day". It would be nice to have the ability to use a local LLM, given how many and how good there are around.
* The installation can be improved massively: setuptools + entry_points + console_scripts to avoid all the hassle behind having to manage dependencies, where your scripts are located and all that. The cp factcheck/config/secret_dict.template factcheck/config/secret_dict.py is a bit.... Uuuugh... pydantic[dotenv] + .env? That would also make the containerizing the application so much easier.