Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Andrej Karpathy's take:

New open weights LLM from @MistralAI

params.json: - hidden_dim / dim = 14336/4096 => 3.5X MLP expand - n_heads / n_kv_heads = 32/8 => 4X multiquery - "moe" => mixture of experts 8X top 2

Likely related code: https://github.com/mistralai/megablocks-public

Oddly absent: an over-rehearsed professional release video talking about a revolution in AI.

If people are wondering why there is so much AI activity right around now, it's because the biggest deep learning conference (NeurIPS) is next week.

https://twitter.com/karpathy/status/1733181701361451130




> Oddly absent: an over-rehearsed professional release video talking about a revolution in AI.


> it's because the biggest deep learning conference (NeurIPS) is next week.

Can we expect some big announcements (new architectures, models, etc) at the conference from different companies? Sorry, not too familiar what the culture for research conferences is.


Typically not. Google as an example: the transformer paper (Vaswani et al., 2017) was arxiv'd in June of 2017, and NeurIPS (the conference in which it was published) was in December of that year; BERT (Devlin et al., 2019) was similarly arxiv'd before publication.

Recent announcements from companies tend to be even more divorced from conference dates, as they release anemic "Technical Reports" that largely wouldn't pass muster in a peer review.


>-hidden_dim / dim = 14336/4096 => 3.5X MLP expand

>- n_heads / n_kv_heads = 32/8 => 4X

These two are exactly the same as the old Mistral-7B


Also, because EMNLP 2023 is happening right now.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: