Knowledge Graph Reasoning Based on Attention GCN

westurner · on Dec 28, 2023

"Snomed CT Entity Linking Challenge" https://news.ycombinator.com/item?id=38744177 :

> - Indicate degree of confidence in annotation (note that AGI hypergraph systems have TruthValue and also AttentionValue, like attention networks

From "AutoML-Zero: Evolving Code That Learns" https://news.ycombinator.com/item?id=23787359 :

> How does this compare to MOSES (OpenCog/asmoses) or PLN? https://github.com/opencog/asmoses https://scholar.google.com/scholar?hl=en&as_sdt=0%2C43&q=%22... (2006)

opencog/atomspace is a hypergraph for knowledge graphs with TruthValue and AttentionValue. https://github.com/opencog/atomspace

examples/python/create_atoms_simple.py: https://github.com/opencog/atomspace/blob/master/examples/py...

- [ ] Clone Atomspace hypergraph with RDFstar and SPARQLstar.

ONNX is a standard and also now an ecosystem for exchange of neural networks. https://en.wikipedia.org/wiki/Open_Neural_Network_Exchange

RDFHDT: RDF Header, Dictionary, Triples: is fast to read but not write.

From https://news.ycombinator.com/item?id=35810320 :

> Is there a better way to publish Linked Data with existing tools like LaTeX, PDF, or Word? Which support CSVW? Which support RDF/RDFa/JSON-LD?

pama · on Dec 28, 2023

How does the recently proved zero-one theorem for graph neural networks affect such works? https://arxiv.org/abs/2301.13060

vinni2 · on Dec 28, 2023

I don’t get why this paper is newsworthy!

Jeff_Brown · on Dec 28, 2023

I can speak to neural networks and graphs generally, if not this specific paper. Neural networks can do reasoning, but their internal representation of information is illegible. Graphs let us represent information legibly.

Byamarro · on Dec 28, 2023

I'm a complete layman, but it sounds awfully like that one problem I've once encountered on the Wikipedia: https://en.m.wikipedia.org/wiki/Explainable_artificial_intel...

hyperliner · on Dec 28, 2023

As an ignorant person but one who has been trying to figure out if KGs are superseded by LLMs, is there a source you can think of to figure out if both have a place in an architecture, or if LLMs are sufficient? What sort of use cases require both?

hansvm · on Dec 28, 2023

Current performant (high accuracy) LLMs have a quadratic cost (space and time) in sequence length, they have a finite size typically much less than any KG of note, their connections are all fuzzy, they aren't especially amenable to small updates, training and fine-tuning don't work well with high-entropy data, and most computations physically cannot be done via any single pass through an LLM regardless of how it was trained.

Those constraints together create a landscape where if you have a big knowledge graph there will invariably be important questions the LLM cannot appropriately answer about it, no matter which strategy you use to try to ramrod the KG into an LLM architecture. If you don't train/fine-tune the LLM on the KG, it doesn't have enough context to answer your questions. If you do, your KG doesn't have enough data duplication to allow training to work well. If you manage to train it anyway, you can't ask compounded questions because of the max LLM circuit depth. If you try anyway and just run the results back into the LLM as input you have a compounding error effect because the whole thing is fuzzy. If you try to circumvent that with error-reduction techniques you tend to blow through the current context windows (quadratic costs) and still have unreliable results. And so on.

None of that is necessarily true forever, but suppose you have a problem where a KG is a natural fit but some of the data is a little fuzzy (you have pretty good graphs of how cities and roads and individuals and companies and whatnot are related, but it's not perfect, and some of it is textual or not otherwise appropriately structured). The KG can answer a number of queries very well, limited by the lack of structure in the node/edge representations. An LLM can't do much because it can't compress all those possible edges into its weights, because it can't fit the whole KG in a context window, it can't be appropriately fine-tuned to the data, and even if it could it couldn't recurse well without compounding error. If you instead use the LLM as a pre-processing step on the nodes or as a fuzzy neighbor search (restricted by the KG) or in some other way, you get a data structure that looks a lot more like a better prepared clean KG and can run traditional KG algorithms to ask questions like who might need your tax prep services or whatever. Getting an LLM to do that for the same cost will take a _ton_ of engineering beyond what I've seen poured into the space.

namibj · on Dec 29, 2023

Materializing the attention matrix has you slower than FlashAttention much of the time, just because you get severely memory bandwidth bound and are worse off than mild extra computations resulting from streaming it (and especially re-computing the attention for the backwards pass, though that won't need extra memory bandwidth, just minor compute.).

hansvm · on Dec 29, 2023

Yes, but the relevance of that is escaping me. Would you mind elaborating?

namibj · on Dec 29, 2023

> Current performant (high accuracy) LLMs have a quadratic cost (space and time) in sequence length,

FlashAttention only takes polylogarithmic space overhead beyond the linear needed for attention. Yes, quadratic compute, but that's much more tame.