> Networks trained on a path integration task almost always learn to optimally encode position, but almost never learn lattice cells (hexagonal or square) to do so... Our main message is that it is highly improbable that DL models of path integration would have produced grid cells as a novel prediction simply from task-training, had grid cells not already been known to exist.
This carries on into an extremely nuanced and technical discussion of the architecture of specific models used.
It's basically pointing out that in some previous papers, authors thought that grid cells always arose when solving this problem, but in fact this only occurs when specific implementation choices are made. So those papers were incomplete, and the phenomenon isn't as clearcut as before.
However! This new complexity still tells us something; if only certain architectural choices produce grid cells, then brains must (in some sense) implement those architectural choices. And the models that don't produce grid cells must be doing something differently to how the brain does things.
In summary I think this paper is probably saying a lot less than most people here are reading into it; some papers are accidentally oversimplifying, and we've found more complexity that needs to be explained. More thorough hyperparameter-space exploration can identify brittle results. It's not some deep point about whether it is philosophically or logically consistent to compare deep NNs to the brain.
Interlacing isn't 4-way or 6-way, it's 10e3-way, and each interlaced connection has a weight that's nonlinearly time-dependant based on how long since last firing.
Every cyclic connection is potentially a self-sustaining oscillator.
None of these features are efficiently implemented in current silicon.
"Caution when comparing neural networks to brains" is underselling it. They're profoundly different kinds of network; nobody is building (or even publicly planning) silicon that has that kind of interconnect breadth.
What do you mean? Any image classifier will use way more than 4 kernels for convolution. All those layers are interlaced. Furthermore they also contain fully connected layers, with neuron integrating way more than 10^3 signals.
The reasons that there aren't much more fully connected layers, is that this doesn't work. Actually, one of the key developments in NNs is architectural, minpools, ReLus, U-net. All are key for modern networks, all architectural.
Sorry, that was hastily written & could have been clearer.
It is, of course, very common to perform (eg) convolution with a larger kernel size, or to use a dense layer.
However, unlike wet-neurons:
* Convolution has the same local shape for each cell.
* Convolution has no self-suppression for recent activation, vs time-dependent, nonlinear response in wet cells.
* Current silicon offers no performance advantage for interconnect to adjacent cells (could be done).
With ~80 billion neurons in a brain, 1000 is not like a dense layer at all.
If you are interested in learning about the intersection of Artificial Neural Networks and Biological Neural Network research, I recommend "The Self-Assembling Brain - How Neural Networks Grow Smarter" by Peter Robin Hiesinger. He attempts to bridge research from both fields of study to identify where there are commonalities and differences in the design of these networks.
What I understand is that he claims the underlying algorithms that govern our behavior and how it evolves from birth are ingrained in our genetic code. Current neural network models try to model our behavior, but it is way behind when it comes to discovering those ingrained algorithms.
To me one important aspect is the existence of adversarially attacks on neural networks.
They essentially prove that the neural network never "understood" its data. It hasn't found some general categories which correspond somewhat to human categories.
Human brains can be tricked too, but never this way and never beyond our capacities for rational thought.
"Never this way and never beyond our capacities for rational thought" makes it so nothing can be stated about the "understanding" any neural network has about any data short of an AGI. Which is obviously not too useful because an incomplete model of something is not the same as not having the model at all. Eg. if the model is generating too many arms, that still means it has extracted some model of what an arm looks like, even if it hasn't fully internalized the fact that humans have up to 2 arms (although depending on the training set this also gets messy as it isn't uncommon for religious imagery of various religions to depict humanoid figures with several arms, where the difference is related to context not available to the AI).
Humans can be tricked very easily by optical illusions and it isn't uncommon for some illusions to be intentionally built to 'harm' people (eg patterns on the floor which make you lose your sense of balance). Even with rational thought such things can be difficult to deal with. We're probably just as vulnerable to adversarial attacks, the issue being that unlike artificial neural networks we don't have an easy feedback loop to run millions of times in guiding a similar adversarial search.
The main difference to optical illusions is that we are aware of them and integrate this knowledge into our model of the world, so that we can deal with them to a certain extend.
Con artists isn't a big problem, if it worked on everyone then you would have con artists become the richest persons in the world. Like, just con Elon Musk out of his billions, why hasn't anyone done that yet if it is so easy to trick humans?
> Like, just con Elon Musk out of his billions, why hasn't anyone done that yet if it is so easy to trick humans?
Like getting him to spend $44 billion for Twitter?
(An ex of mine is convinced that Musk is a con artist, but she's also a literal card carrying anarcho-communist; I'm not that cynical about Musk).
Even at a lower level, I had my bank[0] call up and tell me there was too much money in my account and they'd really recommend a wealth management consultation to avoid me being scammed, and that wasn't even £100k.
That said, I was thinking mainly of street cons — shell games, possibly even shoplifting and pickpocketing — as the previous discussion was about optical illusions. Business level scams are about a broader category of cognitive bias, and I'd say almost all gambling is that type of thing, likewise bitcoin, dulce et decorum est, and populist politics.
[0] or at least they said they were, but I said no before getting to the point where asking for proof the call wasn't itself a scam would've been useful
> Human brains can be tricked too, but never this way and never beyond our capacities for rational thought.
It's inconceivable to me that humans wouldn't be trickable by exactly the same sort of adversarial inputs-- it's just that because we're not differentiable there is no feasible way to find these inputs.
People have constructed fairly impressive optical illusions based on our understanding of the neural structure of the early stages of vision processing. The fact that we lack more complicated examples like "random images" that make us feel hate or disgust or that we're convinced are our mother is simply due to our lack of understanding and access to the higher neural structures.
I imagine that if the structure of a human brain was well-understood and mathematically describable the way NNs are, generating adversarial inputs would be completely feasible.
This is often a big part of how neural networks are trained. Take labelled photos (really it applies in abstract ways to other data) add a little noise, mirror, shrink and stretch, translate to a different area, etc. One piece of data becomes several training images.
And no, this standard practice does not eliminate adversaries.
No not while training, just ask the model to predict for a few transforms and get the mode. Simulating the fact that humans also have multiple frames worth of information about any object from slightly different angles
This is actually a thing, yes (and it does increase robustness to adversarial examples). This is particularly useful if you apply (random) low-pass filtering (or denoising, or DCT-based compression, or anything that messes up high-frequency content) to the image (besides random cropping/rescaling), since adversarial examples often rely on manipulating (human-imperceptible) high-frequency information.
I think probably that this kind of meta-understanding exists on a continuum where fruit flies and slime molds exist on one side and current AI exists somewhere in the lower third and we exist somewhere in the upper third and a future AI with a vast number of AI brain components and huge amounts of training will eventually exist at the other end, possibly as far from us as we are from the fruit fly.
Optical illusions is one thing, but, I don't know, "Predictably Irrational", "Thinking fast and slow" or just whatever is happening all around.
We do not understand our data.
In general yes, I believe most people will only accept thinking machine when it can reproduce all our pitfalls. Because if we see something and the computer doesn't, then it clearly still needs to be improved, even if it's an optical illusion.
But our bugs aren't sacred and special. They just passed Darwin's QA some thousands years ago.
I'd agree about sacred, but I have a hunch they may indeed be special… or at least useful. Current AI requires far more examples than we do to learn from, and I suspect all our biases are how evolution managed to do that.
Humans are trained on petabytes of data. From birth, we ingest sights, sounds, smells etc. Imagine a movie of every second of your life. And an audio track of every second of your life. Etc. Etc.
Tesla autopilot has a movie of every second it's active, for every car in the fleet that uses it. It has how many lifetimes of driving data now? And yet, it's… merely ok, nothing special, even when compared to all humans including those oblivious of the fact they shouldn't be behind a wheel.
Not sure "biases" gives the evolved structures in our brain enough credit. Maybe the functions of those structures could be emergent, in a large enough network, but that would be very different context to what a human sees in its extremely rapid development. The rapid learning could be from the unique architecture. The free running feedback loop (consciousness) that we have seems like a good example of how different our architecture is, with our ability to continuously prompt ourselves, and learn from those prompts.
Yeah but I disagreed about your point "Current AI requires far more examples than we do to learn from", since I think you need to count the amount of data that was seen by all your ancesters, maybe even starting from the first self-replicating molecule, billion years ago.
Fair enough. I wouldn't go quite that far — at absolute most I would accept since the first prototype of a neural cell — but the estimates I've seen for what data/training AGI would need if it requires a simulated re-run of modern human evolution to be trained is more than current (or at least recent) AI.
> Human brains can be tricked too, but never this way and never beyond our capacities for rational thought.
And yet we're living in the age of misinformation where propaganda spreads like never before. Isn't that essentially an adversarial attack which shows how susceptible humans are to them too?
For the record, i don't think that's NN are brains either. I think our issue to differentiate sufficiently between them is because we don't really know what sentience really is.
We've been doing significant research using HTM[1][2][3][4] w/ SDR[5] similar to the proprietary implementation of Semantic Folding[6] and were able to classify literally gigabytes of documents per second on FPGAs. HTM aims to resemble the architecture of the Neocortex.
I simply don't understand why literally everyone immediately jumps at CNNs, RNNs (transformers et al.) -- they're extremely expensive, slow and definitely not usable for SIGINT-sized intel projects.
> I simply don't understand why literally everyone immediately jumps at CNNs, RNNs (transformers et al.) -- they're extremely expensive, slow
Because for text they work a lot better.
HTMs are competitive with other non-NN techniques on small datasets (see [1] you listed) but nothing particularly amazing.
I'd speculate this is because they use bag-of-word variants like LSI and TF-IDF. Prior to Transformers this was a competitive technique and you could get state-of-the-art results on most things using similar techniques.
This representation of data matters a lot. Even just switching to a word embedding representation and using a SVM or something gives a decent gain in most circumstances.
But transformers are much better, particularly on harder tasks (eg question answering on long documents). You can't really see how significant this difference is on these small datasets, but as an example BertGCN is getting over 89% accuracy (HTM in [1] gets 83%).
It's possible (likely!) some of this gain is from the better representation Transformers use, not just the model.
> definitely not usable for SIGINT-sized intel projects
If SIGINT in this context means signal intelligence (on text data) then I assure you that they are being used.
We have seen significantly higher accuracy than that, more than competitive with the transformer approach we’ve been running in parallel. For arbitrary texts we’ve seen accuracies of at least 92%.
> If SIGINT in this context means signal intelligence (on text data) then I assure you that they are being used.
Maybe, I don’t know what projects you’ve been involved with. For the terabit-level pre-sorting of SIGINT data they’re absolutely definitely not used. If at all on the selected information of interest. My information concerns intel actors in Europe.
The aspect of ai that makes me think something related is going on, is how artifacts look in image generation systems like stable diffusion.
Often these systems will have really bizzare artificats, people with 3 arms, etc. However at the same time when you glance at the output without looking carefully you will sometimes miss these artifacts even though they should be absolutely glaring.
Not sure if I'm missing a subtle nuance in your point but to me those "artifacts" are completely expected. Those artifacts like 3 arms are the patterns / outputs in the model, but since it doesn't have a fundamental understanding of the patterns/objects like arms, it just blends many images of arms together and create things like 3 arms. Also why there are so many eyes, arms, legs and other things in other generative programs. It just spits out the training set in random configurations (ish).
I suspect also the reason the images look OK at a glance is because the images as a whole also represent patterns in the model so they actually come from "real life" / artist created images and thus have some sense of cohesion. But making the AI have all the right patterns so it never makes a mistake at all scales of the image while also being able to combine the pattern with real understanding of what they are conceptually is the real trick but until then it will be a "salad bowl collage" thing at random intervals.
The closest thing to the brain it looks like to me is simply the hierarchical nature of it which seems similar to v1/v2/the vision system in humans but I've only been told that, I'm no neuroscientist.
"It just spits out the training set in random configurations (ish)." is a pretty gross misrepresentation and oversimplification of how such a model works, akin to saying a human artist only spits out whatever they saw earlier in their life in random configurations, or saying that SD only spits out pixel values it has seen before, or combinations of pixel values that form edges, etc.
FWIW I don't think there is anything particularly wrong in the model architectures or training data that in some fundamental way makes it impossible to always get 2 arms. After all, lots of other tricky things are almost always correct. I suspect it's a question of training time and model size mostly (not trivial of course as it's still expensive to re-train to check modified architectures etc). It's also a matter of diffusion sampling iterations and choice of sampler at inference time, for the case of SD.
I get your point, but I also think it depends on what you mean by oversimplification. Of course there is _a lot_ of stuff going on and things like SD capture all kinds of information, not just what I described, however, any way you want to describe it, capturing all the "constraints" and real life knowledge to perfectly create realistic images with all the details and all the higher abstractions correctly is not anywhere close I think. Also it's not only to always get 2 arms, it's to - at the same time - also get 2 ears, 2 eyes, perfect pupils, perfect fingers, perfect trees, perfect chairs, all simultaneously (if it is to be used at least in the mainstream) - etc you get my point.
I also don't think there's anything wrong with the model architectures in themselves or the data, nor that it is impossible, only that it is hard and as you say I think it needs a lot of data and clever engineering to fix mistakes. It may even be possible to fix most mistakes, over time, which would be pretty impressive imo, but the absolute limits of what a model can produce/"contain" with our hardware is kind of an open question though interesting.
> but since it doesn't have a fundamental understanding of the patterns/objects like arms, it just blends many images of arms together and create things like 3 arms.
But it rarely would put out say 8 arms. And the repeat artifacts are miles ahead of earlier stuff like clip draw or disco diffusion. So it does seem to have some idea of what's going on, just isn't perfect yet. It gets much worse without the 512x512 resolution, if you push both dimensions it loses scene coherence a lot more.
Actually I should have mentioned this in the original post but I think the "3 arms" thing is kind of a bad example come to think of it. I think in general at least with SD, if's very unlikely to create 3 arms or or 8 arms if you for example ask for a person. Mostly it looks like a person because the text prompt maps to training data of persons, and so they will generally look like people with 2 arms.
However, where it struggles I find is with finer details, and also _placement_ of things like arms, eyes, and relationships between them. This I think is because it only has a general idea of the shape of persons but no data for the exact specifics like where the arms, legs, eyes and so on should be placed in a very realistic anatomical way, and this is where I think the challenge is - the gap between a general pattern of a person and an extremely specific but also general one where it can modify it and transform it like a real human artist can. I'm not sure that's in the data exactly
Isn’t this because a training set usually consists 99% of implied things? Afaik, these never provide a full description like “…, also two hands, two legs, three fingers, arms not bent, adequately long limbs, leg asymmetry, cartoon physics, …”, and also never feed examples of a wrong geometry/biology/etc.
I’m no NN guy, but to me all it seems as basically underconstrained and unrelated to “understanding”. It’s like these e.g. woodwork, magic trick, dancing, guitar, etc teachers who fail to message a way to do something and can only tell “look”, then just do it, ask you to repeat, and get annoyed when you fail again.
For me the distinction is that when an artist draws a 3 armed person it stands out immediately. This makes me feel like something is going on in the ai that is similar to our brains because the blindspots seem similar.
And you're right that this is pretty unfounded intuition. Humans often seek meaning in things without meaning, so it might be unfounded. At some point all i can really do is shrug and say it feels "spooky" to me.
"it just blends many images of arms together and create things like 3 arms. Also why there are so many eyes, arms, legs and other things in other generative programs. It just spits out the training set in random configurations"
That is thoroughly confused to the point of uselessness.
The reason you get structural issues is because it's hard for the architecture to express large scale structure, but they get better and better at it simply by scaling up the network.
It was poorly communicated in my post but what I was referring to there were earlier programs like in 2015 and not SD and newer ones. If you put in an image of a landscape it could fill out the landscape with eyes and elbows all over the generated image because it had no information or context for what an eye was or where it should go.
But now you get SD, dalle and others which add more information not just by scaling, but also by mapping sentences/words to pre-existing images that already have cohesion. That way when you write in sentences to the text prompt, the model has more semantic information about what an eye is, but (IMO) only _indirectly_ because it will map a sentence to images that match that sentence.
The question is always what information is actually contained in the training set and what is missing from it and when it creates an image where is the information from etc.
In some ways, that means I think that meaning to us as humans, is different from scaling which is almost like pixel resolution except resolution of patterns and differentiation of patterns. Meaning in this sense is things like creating a doorway with no actual door, but still the doorway itself looks super realistic is rendered.
You can fix it by scaling and increasing the differentiation of patterns I guess, but you can never fix all instances completely with scaling. That's why in some ways I think meaning is sort of orthogonal to scale, however on a philosophical level, they should converge but that's for another topic.
I may have missed something in my thoughts here because this is sort of difficult to talk about without writing a book eventually.
the hierarchical structure of the visual system is a completely emergent property of the fact that the visual brain is a three dimensional object encoding a 2 dimensional objecr efficiently by maintaining the spatial relation present in the data in the representation until you have finished using it.
While there's definitely a similarity it's also important not to over generalize. For example the human vision system and stable diffusion may end up using similar feature decomposition, but that doesn't mean the rest of the brain works anything like that.
I strongly suspect that if we do ever fully map the "architecture" of the brain, the result will be a massive graph that's not readily understandable by humans directly. This is already the case in biology. We'll end up with a computational artifact that'll help us understand cause and effect in the brain, but it'll be nothing like a tidy diagram of tensor operations like in state of the art ML papers.
Yes! I’m confident this isn’t an original thought, but I feel like it’s a dream generator. Things that aren’t quite right but are in some way, perfectly contextually and topologically valid. Like it’s tricking the object classifier in my brain with a totally unrealistic thing that my brain is ready to simply accept.
There’s some image I see on occasion that’s 100% garbage. If you focus on it you cannot make out a single thing. But if you glance at it or see it scaled down, it looks like a table full of stuff.
Not just image generating systems. Basically all neural network based AIs have a tendency to fail in ways that are very brain-like. GPT-3 produces output that seems to make sense as long as you aren't paying too close attention, image recognition AIs are more likely to mistake a cat for a dog than for something completely unrelated, speech recognition AIs often make very sensible seeming transcription errors, etc. Not to mention how revolutionary NNs have been in solving problems that brains find easy but machines have traditionally struggled with, and how terrible they are at things machines find easy and brains have traditionally struggled with. Maybe there's no relationship between the way NNs work and the way brains do, but the end result certainly seems to be similar.
I think anyone who has tripped would also commiserate. Seeing too many eyes or fingers at a glance. Things feeling cartoony or 'shiny'.
I don't know if AGI is down the road diffusion models have taken us. I'm not even really sure what most people mean by AI when they talk about it. But stable diffusion et al are clearly super human. I'm not sure that AGI is down the trail cut by diffusion models, but if it's ever accomplished, these models will almost assuredly represwbt some of the learnings required to get there.
My pet (uneducated) theory is that AI needs to have a parent layer "consciousness" before it can become an AGI. Think of that voice inside your head and your ability to control bodily functions without needing to do it all the time. My model is our brains have many specialized "sub AIs" operating all the time (remembering to breathe for example) but then the AI behind the voice can come in and give commands that override the lower level AIs. What you think of as "me" is really just that top level AI but the whole system is needed to achieve general intelligence. Sort of like a company with many levels of employees serving different functions and a CEO to direct the whole thing, provide goals, modify components, and otherwise use discretion.
Seeing my hand covered in eyes while tripping completely changed my view of the mechanisms behind sight. Something that had previously seemed so “real” and deterministic suddenly was no longer; the interpretation layer was momentarily unveiled.
But wouldn't the people creating these models and deciding whether to publish them prefer ones with these "understandable" mistakes? There might have been other ones that had equal potential as far the evaluation measure goes, but humans had been involved all along the way and said, "Yeah, that picture looks like a person made it. We should keep developing this model."
For me, it's the way generative videos can rapidly, but to my eyes seamlessly, transition from one shape to another. I may not be able to record my dreams, but my memories of my dreams do match this effect, with one place or person suddenly becoming another.
Most introductory deep learning courses are very clear about how far the analogy goes, if people are interpreting it as something more I don't think it's the fault of practitioners/educators and more the fault of people's imagination and selective hearing.
> Study urges caution when comparing neural networks to the brain
They keep telling me this and, yet, I can't stop doing it. The more I learn about neural networks, the more I feel like I understand my own brain (whether accurate or not). And conversely, the more I think about thinking, the better my theories about how I'd build ML-based system to solve specific problems (admittedly, most untested). Neural networks seem like too useful of a model to simply give up because they aren't completely accurate.
Of course, this is all just for personal use - mostly introspection. I wouldn't exactly do medical work based on the model.
I once had the same feeling, but it was dispelled by acknowledging that NN neurons are not even approximations of how neurons work. At most, NNs are inspired by the topology of a subset of neurons, and that's where the similarity between NNs and biological neurons stop. It's like the connection between objects in real life and objects in programming. They're both useful abstractions that are inspired by things in the real world, but the similarities to things in the real world stop there.
Neurons have a lot going on, they send and receive signals through a multitude of mediums, not just neural impulses, and they're capable of plasticity when it comes to the connections they make between other neurons. Neurons also don't have simplistic activation functions, they're capable of doing a lot more with the information they receive and send. Also, gradient descent and back propogation don't take place in any part of the brain.
Through that lens, I see NNs as if they're like really complex and impressive Markov chain generators. They can produce results that look intelligent, but it's just statistical correlations, and not at all how the brain works.
What attributes does a system need for you to accept its comparison to a brain/neuron?
Without defining what's essential, I'm nervous to call the comparison insufficient. If a topological subset of neurons isn't good enough, what do we need in addition/instead? If we stuff NNs full of complicated (how complicated?) activation functions, does that new system do the trick? Or add...47 new "neuron" variants? Or swap the learning scheme from gradient descent to something fancier? (For that matter, do we even know what the brain's scheme is, and why GA/back prop isn't an acceptably extremely crude approximation of it?)
The brain is so unimaginably intricate. Our models are hilariously simple in contrast, of course. But what of those mismatches are differences in kind vs. differences in magnitude?
> I see NNs as if they're like really complex and impressive Markov chain generators.
Funnily enough, that’s how I see brains. They start out as a few neurones basically implementing ‘hard wired’ logic, then some feedback loops form and next thing you know they’re asking “why am I here?”
People always do things like this, like an astrologer trying to say something profound using Heisenberg's uncertainty principle applying it to your romantic relationships, or whatever.
These "just so" stories are attractive but it is quite important to realize a metaphor which is intuitive and you perceive as useful is nothing at all like the process for finding real scientific truth. There is also a lot of introspective value to modeling the world as being controlled by mysterious gods who are pleased or appalled at your behavior and that's why good and bad things happen. Perhaps useful for some people but nothing at all like truth.
This is definitely what I was thinking about when I started my comment. I torpedoed my own point with the comment about the model being "too useful". Your explanation is much better.
When I said "I can't stop," I was referring more to this tendency to borrow models to explain unrelated systems. It's just a thing my brain wants to do and I can't help it (and again, I seem to convince myself that it's somehow accurate or useful even if, rationally, I'm quite sure it's not).
Yup, unrelated things do indeed often look and behave in similar ways. Humans are just a little bit overdriven to find patterns and end up finding some that don’t exist. It’s a useful trait for finding difficult patterns and there’s probably a stable point to maximize benefits which lands on finding a few too many.
It isn't really that much different than, "Are we living in a dream?", of the movie Inception that's been asked since antiquity. Is the world some sort of illusion? How would we know?
Last week's Lex Fridman podcast featured Andrej Karpathy (former director of AI at Tesla, founding member of OpenAI) and they discussed this aspect briefly also.
The usefulness of neural networks has not ceased, despite researchers' early ideas and hopes about its biological analogies having somewhat sheared away.
As someone who studied Neuroscience in college, I remember this paper and some other examples showing just how different computational neural networks are from real neurons. It's difficult for me to believe that professional researchers could really believe a NN is an accurate model of the real deal.
The paper also does not have any reference to a study or paper that explicitly states that a neural network is a good model for grid cells. (Please correct me if I am wrong.) So I am left wondering why this direction was chosen.
Maybe it's a little cynical, but this topic seems to have been chosen (at least in part) to produce a splashy headline. Or in other words, to give the Stanford and MIT PR engine something to print.
This is the sort of obvious thing we all knew to be true. Why people with access to lab animals and a fully stocked microbiology lab needed to prove it (again) I do not understand.
Here's my guess: neurons tap into quantum mechanics but we are too primitive to understand that for now. The brain was initially modeled as humors/fluids back when we developed aqueducts, then telegraph came into the scene and it was modeled as electrical impulses and now computers/ML are popular therefore we see it as a neural network. Next step is quantum.
It's certainly not proven, but there are many hints in that direction, and the hints keep piling up. Recent research [1] on how the classic anaesthetics work (a great mystery!) suggests they operate by inhibiting the entanglement of pairs of electrons in small molecules which split into free radicals, the electrons then physically separated but still-entangled.
It seems it is at least possible, that there is speed-of-light quantum communication within the brain. And that consciousness may hinge fundamentally on this. If this is true, we're pretty much back to square one in terms of understanding.
We don't currently fully know how anesthetics work largely because we don't really know how the human brain works on a large scale. We'd have to solve that before seriously proposing quantum effects. In other words, it's too early to rule out classic physics and chemistry as the brain's primary mechanism. (Although solving how it works could first solving quantum mysteries, but Occam's razor is classic rules in my opinion.)
Chemistry is quantum physics at its core. It is just that quantum equations are so hard to solve for anything bigger than hydrogen that most of the times, chemists prefer to use empirical rules to do their job.
Einstein's spooky action at a distance, between neurons. This is speculative, maybe recklessly so, but one possible interpretation is that these are neurotransmitters. The halves of the entangled pair float off and they bind to different receptors, and do their usual neurotransmitter thing of affecting how the neuron fires. But they are entangled, so theoretically the quantum state of one half could affect the other half and alter the chemical properties of the molecule that contains the other electron. Neurotransmitters signalling through a quantum communication channel. This effect would propagate at light speed, although the subsequent chemical side of it would not.
"Spooky action at a distance" propagates at faster than light speed.
That's why Einstein thought it was spooky! But in the widespread interpretation[1] of quantum entanglement it turns out not to be a problem because (while entanglement effects are real) it's impossible to transmit information or action via it.
Worth noting that this link doesn't talk about that at all. Instead it's about quantum chemistry effects.
If the Brian is using some physics we don’t understand that’s something new not Quantum Mechanics. QM a specific theory of how the world operates, if something else is involved it doesn’t fall under that theory it’s [insert new theory’s name here].
I really don’t get why everyone wants the Brian to operate on some new QM effect other than peoples perception that a 100 year old theory is somehow cutting edge, spooky, or something. Perhaps it’s that the overwhelming majority of people who talk about QM don’t actually understand it even a little bit. Odd bits of QM are already why lasers, LED’s, and transistors work. You use incites from the theory everyday in most electronic devices, but it’s just as relevant for explaining old incandescent bulbs we just had other theories that seemed to explain them.
I think you're probably missing a number of the important details. In the Penrose/Hammerof model, they're explicitly saying that humans are observed to generate problem solutions that could not have been generated by a purely classical computing process, therefore, the brain must exploit some specific quantum phenomenon.
When you talk about QM a a theory of how the world operates, there are wide ranges of QM. Everything from predicting the structure and energy states of a molecule, to how P/N junctions work, to quantum computers. Now, for the first one (molecules), the vast majority of QM is just giving ways to compute the electron density and internuclear distances using some fairly straightforward and noncontroversial approaches.
For the other ones (P/N junctions, QC computers, etc), those involve exploiting very specific and surprising aspects of quantum theory: one of quantum tunnelling, quantum coherence, or quantum entanglement (ordered from least counterintuitive to most). We have some evidence already that there are some biological processes that exploit tunnelling and coherence, but none that demonstrate entanglement.
Personally, I think most people think the alternative to Penrose- the brain does not compute non-computable functions, and does not exploit or need to exploit any quantum phenomena (expect perhaps tunnelling) to achieve its goals.
Now, if we were to have hard evidence supporting the idea that brains use entanglement to solve problems: well, that would be pretty amazing and would upend large parts of modern biology adn technology research.
The Brian using entanglement would completely destroy modern physics as we know it, the effect on biology would be tiny by comparison.
Your other points are based on such fundamental misunderstanding that it’s hard to respond. Saying something isn’t the output of classical computing processes while undemonstrated, is then used to justify saying they must therefore use Quantum Phenomenon. But logically not everything that is either classical or Quantum so even that logical inference is unjustified. Logically it’s like saying well it’s not a soda so it must be a rock.
PS: If people where observed to solve problems that can’t be solved by classical computer processing that would be a really big deal. As in show up on Nightly News, and win people Nobel prizes big. Needless to say it hasn’t happened.
The set of problems that are computable by a classical computer are the same set of problems computable by a quantum computer. I think you might be misstating the Penrose argument/position.
I should have said "problems which do not have computable solutions" rather than "set of problems computable by a quantum computer", which seems fairly pedestrian compared to what Penrose is saying.
My understanding of the hypothesis being represented here is QM as a kind of random number generator operating at the neuron/microtubule level. I didn't think there was anything other than a modest injection of randomness being invoked, but I could be misstating the premise.
It's an absurd premise to begin with: The scale at which quantum effects propagate and are observed is radically different than the scale at which the neurons in your brain operate.
The functional channels for neurons are well understood, even if we're still diagramming out all the types of neurons. Voltage gated calcium channels are pretty damn simple in the grand scheme of things, and they don't leave space for quantum interactions beyond that of standard molecular interactions.
The only part of the brain we don't understand is how all the intricacies work together, because that's a lot more opaque.
Neurons almost certainly use quantum processes, but so do most transistors. The brain is too too warm for large-scale quantum effects though. You're not going to find phase coherence at that scale in such an environment, which is pretty much the prerequisite for quantum effects (that is fairly well understood).
I believe what was meant was quantum-only or primarily-quantum effects rather than the aggregate effects we normally see (classic physics & chemistry), which are probably the result of quantum physics, but we have "classic" abstractions that model them well enough. Thus, the issue is whether the brain relies mostly on classic effects (common aggregate abstractions) for computations or on quantum-specific effects.
I don't think that's a meaningful distinction. Many effects in classical physics are just previously poorly understood quantum effects. The distinction has more to do with when they were discovered than what causes them. Electricity is a good example. A large reason why electrons act collectively the way they do is a direct consequence of the pauli exclusion principle.
People don't understand quantum physics.
People also don't understand AGI.
Therefore it's obvious that they're related.
So it seems clear that AGI will be solved with the help of quantum physics.
My aunt Mildred is a very well renown academic and has written much on this topic.
She unfortunately is also not well understood.
So it seems quite clear - perhaps obvious - that AGI will be solved by applying some Mildred.
Because we don't understand quantum physics, and we don't understand the brain. I don't think we know if it's the final step. There could be wizard jelly or something at the bottom.
Quantum physics is fairly well understood. Perhaps not among laymen, but that's mostly due to pedagogical challenges, which is why a lot of the discourse seems to be stuck approaching it as though we were living nearly 100 years into the past.
I think it’s also important to highlight that the analogy between neural networks and brains is to help people visualize what a neural network is, not what a brain is. It’s really just to convey the idea of multiple nodes passing information to one another. After that point, the comparison is useless because the two systems diverge so wildly outside of that one (pretty loose) conceptual connection.
this article makes me sad ... a neural network can be also a network of biological neurons, the author means artificial neural network https://en.m.wikipedia.org/wiki/Neural_network
the Wikipedia article even goes into the differences, so why did we need a study for that?
A study urges caution comparing Jellyfish to Jelley ... tasters found they are not the same (even though I hear that fried jellyfish taste nice...)
study urges caution comparing the model to the real thing, as the model has some generalizations the real thing does not ...
the motivation is also in the article, because the original research that suggested similarities in activity only achieved this by doing it under conditions that are implausible in biological systems, therefore that original research likely was misleading.
my point was that the architecture and models are so fairly different that at least the title is trivial. Nobody even the original research claimed that.
the suggested research and the paper in question deal with grid cells and how they emerge ...
Still artificial neural networks are summations over functions ... vastly different from our brain neural network. so yes caution is advised, yet that point and the title is so obvious that we don't need a paper for it.
my assumption, the author hides the rather technical contribution of the paper behind a tautology to get some attention. seemed to have worked on hackernews as it's on the front page.
The brain is not involved in a whole lot of behaviors though. Cells organize themselves to an extent. Cuts heal without us focusing conscious thought on them.
The brain is a hard drive but the body is the whole computer.
Science is proving physical causation. Not just writing down what we want to be true.
Andrej Karpathy was recently on Lex Fridman's podcast and covered this to some extent. He has the same perspective on this topic and expanded on it quite a bit. Great listen overall IMHO - https://www.youtube.com/watch?v=cdiD-9MMpb0
Any true understanding or new insight originates in the plexus solaris which is near your heart, then somewhat slowly works it way up to the spine. The brain is a somewhat predictable fleshy motor capable of turning the insight into language, storing it in memory, or acting on it. Most of the times we "get by" with the stored procedures in the brain but don't imagine it's the place where original understanding is generated. Funny how the ancient Egyptians understood this but we don't. Of course this is also why all attempts to create AI by simulating what happens in the brain are doomed to hilarious failure.
> Unique to Neuroscience, deep learning models can be used not only as a tool but interpreted as models of the brain. The central claims of recent deep learning-based models of brain circuits are that they make novel predictions about neural phenomena or shed light on the fundamental functions being optimized... Using large-scale hyperparameter sweeps and theory-driven experimentation, we demonstrate that the results of such models may be more strongly driven by particular, non-fundamental, and post-hoc implementation choices than fundamental truths about neural circuits or the loss function(s) they might optimize. Finally, we discuss why these models cannot be expected to produce accurate models of the brain without the addition of substantial amounts of inductive bias, an informal No Free Lunch result for Neuroscience. In conclusion, caution and consideration, together with biological knowledge, are warranted in building and interpreting deep learning models in Neuroscience.
And IMO a succinct description of the problematic assumption being cautioned against in the study's introduction section:
> Broadly, the essential claims of DL-based models of the brain are that 1) Because the models are trained on a specific optimization problem, if the resulting representations match what has been observed in the brain, then they reveal the optimization problem of the brain, or 2) That these models, when trained on sensibly motivated optimization problems, should make novel predictions about the brain’s representations and emergent behavior.
---
I think to most, the problem with claim number 2 directly above is obvious, but it's important to also look at claim 1.
I fail to see the significance of this "urging of caution". What's next? Will they tell us those are not really analog like neurons and are in fact using binary numbers in their calculations? O the horror!
Who cares? Everyone knows ML models do not reflect the mechanics of how biological brains work at low level. The most obvious is that they use electricity, discrete numbers, much faster refresh rate etc. As a consequence the other low level "implementation details" will differ. The closer to "the hardware" the more differences there will be. I woukd be extremely surprised to see similar encoding, activation waves/patterns as in biological systems in ML for this reason, but also because how different the learning data and even the learning mechanism is. The brain has no backpropagation.
However, there is deep similarity between both and IMO we are not far from AGI(decades at most). There is a measure of similarity between some advanced ML models (stable diffusion in visual, bloom in reasoning) and how our thinking works. This is especially visible when those things break or produce unexpected results in comparison with damaged/psychedelic human brain.
Just like a human performing a math calculation and a computer performing the same calculation are doing essentially the same thing despite vastly different "implementation method", and same as computers helped us advance our understanding of mathematics(and physics etc) ML models will help us understand more about how our own thinking works.
Just as there is something universal in an act of adding two numbers, there is something universal in an act of processing language to derive intent and carry out complex instructions.
The crucial unknown however at this stage is whether our most advanced ML models are indeed using the same universal high level mechanisms we do to understand our input when they demonstrate their incredible capabilities or are they simply some advance method of compressing and searching through the training data? The first stage of answering this question is to determine if there is really a difference. Perhaps all we are, are databases doing an effective search algorithm over our training data?
This is what science hopefully will answer in coming years. In one way the pace of incredible discoveries of those new and bigger models is not leaving the scientific community enough time to study those models fuly. I can imagine many lifetimes could be spent just studying bloom or stable diffusion, but how to do it when new models twice their size show up 6 months later? How to focus on one model and one application of it in this quickly changing environment?Still, I'm very grateful that I can see this progress during my lifetime. While growing up in the 90s I had this feeling of "missed opportunity" that I never saw nor I have taken any part in the computing revolution that happened before I was born, but this new AI revolution certainly makes up for that.
The meme of fusion being "5 years from now ... for past 40 years" is so frustrating. This is because the investment into it went down to abysmal -- not even "maintenance" level of when it was just getting started.
If the government spent money on it, we would likely have more progress.
And today isn't like it was before -- we have ReBCO ;)
I sometimes wonder how much further along these initiatives might be if the economy was focused on them instead of My Pillow sales, cheap crap from Walmart, and simply handing personally wealthy elites stacks of cash to create pointless jobs and the money was put into net new technology, not Twitter 2.0 and VR 4.0
China is definitely not a “planned economy” in a sense that you meant it. Also, every economy is planned in some sense. Every government plans how much it’s going to make and spend.
So all these US businesses do zero planning? The Fed is raising rates for lulz? We’ve unintentionally allowed consolidation of ownership? No one has any clue? You’re nitpicking semantics.
This carries on into an extremely nuanced and technical discussion of the architecture of specific models used.
It's basically pointing out that in some previous papers, authors thought that grid cells always arose when solving this problem, but in fact this only occurs when specific implementation choices are made. So those papers were incomplete, and the phenomenon isn't as clearcut as before.
However! This new complexity still tells us something; if only certain architectural choices produce grid cells, then brains must (in some sense) implement those architectural choices. And the models that don't produce grid cells must be doing something differently to how the brain does things.
In summary I think this paper is probably saying a lot less than most people here are reading into it; some papers are accidentally oversimplifying, and we've found more complexity that needs to be explained. More thorough hyperparameter-space exploration can identify brittle results. It's not some deep point about whether it is philosophically or logically consistent to compare deep NNs to the brain.