> it's because the biggest deep learning conference (NeurIPS) is next week.
Can we expect some big announcements (new architectures, models, etc) at the conference from different companies? Sorry, not too familiar what the culture for research conferences is.
Typically not. Google as an example: the transformer paper (Vaswani et al., 2017) was arxiv'd in June of 2017, and NeurIPS (the conference in which it was published) was in December of that year; BERT (Devlin et al., 2019) was similarly arxiv'd before publication.
Recent announcements from companies tend to be even more divorced from conference dates, as they release anemic "Technical Reports" that largely wouldn't pass muster in a peer review.
New open weights LLM from @MistralAI
params.json: - hidden_dim / dim = 14336/4096 => 3.5X MLP expand - n_heads / n_kv_heads = 32/8 => 4X multiquery - "moe" => mixture of experts 8X top 2
Likely related code: https://github.com/mistralai/megablocks-public
Oddly absent: an over-rehearsed professional release video talking about a revolution in AI.
If people are wondering why there is so much AI activity right around now, it's because the biggest deep learning conference (NeurIPS) is next week.
https://twitter.com/karpathy/status/1733181701361451130