Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The thing to understand about any model architecture is that there isn't really anything special about one or the other - as long as the process differentiable, ML can learn it.

You can build an image generator that basically renders each word on one line in an image, and then uses a transformer architecture to morph the image of the words into what the words are describing.

They only big difference is really efficiency, but we are just taking stabs at the dark at this point - there is work that Google is doing that eventually is going to result in the most optimal model for a certain type of task.



Without going into too much detail: the complexity space of tensor operations is for all practical purposes infinite. The general tensor which captures all interactions between all elements of an input of length N is NxN.

This is worse than exponential and means we have nothing but tricks to try and solve any problem that we see in reality.

As an example solving mnist and its variants of 28x28 pixels will be impossible until the 2100s because we don't have enough memory to store the general tensor which stores the interactions between group of pixels with every other group pixels.


While true in a theoretical sense (an MLP of sufficient size can theoretically represent any differentiable function), in practice it’s often the case that it’s impossible for a certain architecture to learn a specific task no matter how much compute you throw at it. E.g. an LSTM will never capture long range dependencies that a transformer could trivially learn, due to gradients vanishing after a certain sequence length.


You are right with respect to ordering of operations, where recurrent networks have a whole bunch of other computational complexity to them.

However, for example, a Transformer can be represented with just deeply connected layers, albeit with a lot of zeros for weights.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: