Hacker News new | past | comments | ask | show | jobs | submit login

I am with you on this. Mistral 7B is amazingly good. There are finetunes of it (the Intel one, and Berkeley Starling) that feel like they are within throwing distance of gpt3.5T... at only 7B!

I was really hoping for a 13B Mistral. I'm not sure if this MOE will run on my 3090 with 24GB. Fingers crossed that quantization + offloading + future tricks will make it runnable.




True i've been using the OpenOrca finetune and just downloaded the new UNA Cybertron model both tuned on the Mistral base.

They are not far from GPT-3 logic wise i'd say if you consider the breadth of data, ie. very little in 7GB's; so missing other languages, niche topics and prose styles etc.

I honestly wouldn't be surprised if 13B would be indistinguishable from GPT-3.5 on some levels. And if that is the case - then coupled with the latest developments in decoding - like Ultrafastbert, Speculative, Jacobi, Lookahead etc. i honestly wouldn't be surprised to see local LLM's on current GPT-4 level within a few years.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: