Llama 3 outputs text and can only see text, this is a vision model.
>that would make it Llama-2-based.
It's based on Llama 3, Llama 2 has nothing to do with it. They took Llama 3 Instruct and CLIP-ViT-Large-patch14-336, train the projection layer first and then later finetuned the Llama 3 checkpoint and train a LoRA for the ViT.
this option is removed from youtube afaik, however, google photos now this option on local devices. If you edit videos you can see stabilise option, which is actually the youtube option.
Maybe it is not all related with sound but vision! it must be related with inputs to brain but it cant be only sound.
If you ever work with computer vision, rotating an image a little, lets say tilt, can vastly increase or decrease your algorithm's performance.
For example: early face detection algorithms were looking for T zone in your face from eyebrows to nose. (cheeckbones are generally brighter areas) However, those old algorithms were very bad at if the image is 180 degree rotated (T zone is not T anymore). Dont have to be 180 45 degree is enough for most face detection algorithms.
In neural nets, artificial intelligence practices like augmenting data, rotate the image a little and give the same image as an input data namely Augmenting data improves algorithm performance.
In my opinion the dogs multiply the input by rotating or tilting their head so that autmatically augmenting data!!! So that they can detect better whatever they were looking before.
Computer scientist keep inspiring from nature but maybe its time to explain some phenomena with computer science methods. Researcher force themselves to find a new thing in computer area, maybe they go beyond nature sometimes unintentionally.
It would be interesting to know if blind or blind-folded dogs also exhibit this behaviour.
My intuition is that head-tilting behaviour is more analogous to how we (humans) look up and the to left/right when concentrating. That feels less like augmenting the visual data and more like blocking visual input while we focus compute on recall and more complex thought.
My dog is blind and does the head tilt when she hears certain noises (e.g. dogs barking). She was not born blind though, in case that could make a difference.
> rotating an image a little, lets say tilt, can vastly increase or decrease your algorithm's performance
Wouldn't be the first time evolution used the trick. The eyes, in mammals at least, constantly wiggle even when fixed on an object. If the eyes stop moving entirely your field of vision fades out after some seconds. https://en.wikipedia.org/wiki/Saccade
> the eyes move around, locating interesting parts of the scene and building up a mental, three-dimensional 'map' corresponding to the scene
Your reference is very interesting for me because in computer vision we try to locate local features and try to match these local features, so that 3D correspondence can be extracted from multiple images which called structure from motion (SfM). Now I know, humans (and animals) also do the same thing!
That's a good point! And dogs don't have great vision-- the the quality of their visual input might be a bound on their ability to recognize the small unsmelly objects that are important to humans. I wonder if they do the head tilt when doing the same test with odors instead of objects (not that it'd prove anything).