Hacker News new | past | comments | ask | show | jobs | submit | abbaselmas's comments login


That is someone else who has just used the Llava name.

It is not by the original group who have published a series of models under the Llava name.


This appears to be a Llava model which was then fine-tuned using outputs from Llama 3. If I understand correctly, that would make it Llama-2-based.


>fine-tuned using outputs from Llama 3.

Llama 3 outputs text and can only see text, this is a vision model.

>that would make it Llama-2-based.

It's based on Llama 3, Llama 2 has nothing to do with it. They took Llama 3 Instruct and CLIP-ViT-Large-patch14-336, train the projection layer first and then later finetuned the Llama 3 checkpoint and train a LoRA for the ViT.


Sheldon Cooper, the best number 73 (21st prime number), its mirror 37 (12th prime number)

https://www.youtube.com/watch?v=HacqfsV7ug0


I really dislike base-10 tricks. They are just cute, rarely have any deeper significance.


Also deep dive in the 73 and 37 https://www.youtube.com/watch?v=4DQndnAhdxk


my favorite one is visualping.io


Your chat button hides twitter button at the bottom of the page.


Good catch. Thank you for the feedback!


i had some problem with celery a few days ago, this cant be coincidence. Thanks for the info.


this option is removed from youtube afaik, however, google photos now this option on local devices. If you edit videos you can see stabilise option, which is actually the youtube option.


couldnt agree more. btw there is only three myth :facepalm:


Maybe it is not all related with sound but vision! it must be related with inputs to brain but it cant be only sound.

If you ever work with computer vision, rotating an image a little, lets say tilt, can vastly increase or decrease your algorithm's performance.

For example: early face detection algorithms were looking for T zone in your face from eyebrows to nose. (cheeckbones are generally brighter areas) However, those old algorithms were very bad at if the image is 180 degree rotated (T zone is not T anymore). Dont have to be 180 45 degree is enough for most face detection algorithms.

In neural nets, artificial intelligence practices like augmenting data, rotate the image a little and give the same image as an input data namely Augmenting data improves algorithm performance. In my opinion the dogs multiply the input by rotating or tilting their head so that autmatically augmenting data!!! So that they can detect better whatever they were looking before.

Computer scientist keep inspiring from nature but maybe its time to explain some phenomena with computer science methods. Researcher force themselves to find a new thing in computer area, maybe they go beyond nature sometimes unintentionally.


It would be interesting to know if blind or blind-folded dogs also exhibit this behaviour.

My intuition is that head-tilting behaviour is more analogous to how we (humans) look up and the to left/right when concentrating. That feels less like augmenting the visual data and more like blocking visual input while we focus compute on recall and more complex thought.


My dog is blind and does the head tilt when she hears certain noises (e.g. dogs barking). She was not born blind though, in case that could make a difference.


It’s interesting some poker players before making a ridiculous bluff will look up and to the right.


> It’s interesting some poker players before making a ridiculous bluff will look up and to the right.

It's a common tell for lying in general, not just bluffing in poker.


> rotating an image a little, lets say tilt, can vastly increase or decrease your algorithm's performance

Wouldn't be the first time evolution used the trick. The eyes, in mammals at least, constantly wiggle even when fixed on an object. If the eyes stop moving entirely your field of vision fades out after some seconds. https://en.wikipedia.org/wiki/Saccade


> the eyes move around, locating interesting parts of the scene and building up a mental, three-dimensional 'map' corresponding to the scene

Your reference is very interesting for me because in computer vision we try to locate local features and try to match these local features, so that 3D correspondence can be extracted from multiple images which called structure from motion (SfM). Now I know, humans (and animals) also do the same thing!


That's a good point! And dogs don't have great vision-- the the quality of their visual input might be a bound on their ability to recognize the small unsmelly objects that are important to humans. I wonder if they do the head tilt when doing the same test with odors instead of objects (not that it'd prove anything).


Why netdata is popular on HN now? There must be some big news or something..


i think it was just a random share by someone.


System76 blog is on tumblr! I thought that tumblr was dead.


This was my biggest takeaway also, why Tumblr...


They also have a Telegram channel!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: