Andrej Karpathy's take: New open weights LLM from @MistralAI params.json: - hidd...

henrysg · on Dec 8, 2023

> Oddly absent: an over-rehearsed professional release video talking about a revolution in AI.

crakenzak · on Dec 8, 2023

> it's because the biggest deep learning conference (NeurIPS) is next week.

Can we expect some big announcements (new architectures, models, etc) at the conference from different companies? Sorry, not too familiar what the culture for research conferences is.

jbarrow · on Dec 9, 2023

Typically not. Google as an example: the transformer paper (Vaswani et al., 2017) was arxiv'd in June of 2017, and NeurIPS (the conference in which it was published) was in December of that year; BERT (Devlin et al., 2019) was similarly arxiv'd before publication.

Recent announcements from companies tend to be even more divorced from conference dates, as they release anemic "Technical Reports" that largely wouldn't pass muster in a peer review.

GaggiX · on Dec 9, 2023

>-hidden_dim / dim = 14336/4096 => 3.5X MLP expand

>- n_heads / n_kv_heads = 32/8 => 4X

These two are exactly the same as the old Mistral-7B

Der_Einzige · on Dec 9, 2023

Also, because EMNLP 2023 is happening right now.