What are the steps required to get this running in VS Code?
If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.
(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)
If you can run this using ollama, then you should be able to use https://www.continue.dev/ with both IntelliJ and VSCode. Haven’t tried this model yet - but overall this plugin works well.
Correct. The only back-end that Ollama uses is llama.cpp, and llama.cpp does not yet have Mamba2 support. The issues to track Mamba2 and Codestral Mamba support are here:
Unrelated, all my devices freeze when accessing this page, desktop Firefox and Chrome, mobile Firefox and Brave.
Is this the best alternative to access code ai helpers besides the GitHub Copilot and Google Gemini on VSCode?
I've been using it for a few months (with Starcoder 2 for code, and GPT-4o for chat). I find the code completion actually better than Github Copilot.
My main complain is that the chat sometimes fails to correctly render some GPT-4o output (e.g. LaTeX expressions), but it's mostly fixed with a custom system prompt. It also significantly reduces the battery life of my Macbook M1, but that's expected.
"All you need is users" doesn't seem optimal IMHO, Stability.ai providing an object lesson in that.
They just released weights, and being a for profit, need to optimize for making money, not eyeballs. It seems wise to guide people to the API offering.
On top of Hacker News (the target demographic for coders) without an effective monetizable call to action? What a missed opportunity.
Github Copilot makes +100M/year, if not way way more.
Having a VS Code extension for Mistral would be a revenue stream if it was one-click and better or cheaper than Github Copilot. It is malpractice in my mind to not be doing this if you are investing in creating coding models.
I see, that makes sense: make an extension and charge for it.
I assumed they meant free x local. It doesn't seem rational to make this one paid: its significantly smaller than their better model, and even more so than Copilot's.
But they also signal competence in the space which means M&A. Or big nation states in future would hire them to produce country models once the space matures as was Emad's vision.
If you believe LLMs are going to end up built into everything and doing everything, from moderating social media to writing novels and history books, making such a model will be the most political thing that has ever happened.
If your country believes guns=bad nipples=good war=hell but you get your novels and history books written by an LLM trained by people who believe guns=good nipples=bad war=heroic it would be naive to expect the output to reflect your values and not theirs.
Even close allies of the US would be nervous to have such power in the hands of American multinational corporations alone - so the French state could be very eager for Mistral to produce a competitive product.
More or less; it was about as serious as your median Elon product tweet the last decade, or median coin nonsense.
Half-baked idea that obviously the models would need to be tuned for different languages / for specific knowledge, therefore countries would pay to do that.
There were many ideas like that, none of them panned out, hence the defenestration. All love for the guy, he did a very, very good thing. It's just meaningless to invoke it here, not only because it's completely off-topic, if anything that's already the play as the EU champion, and because the Stability gentleman was just thinking out loud, nothing more.
I feel like local models could be an amazing coding experience because you could disconnect from the internet. Usually I need to open chatgpt or google every so often to solve some issue or generate some function, but this also introduces so many distractions. imagine being able to turn off internet completely and only have a chat assistant that runs locally. I fear though that it is just going to be a bit to slow at generating tokens on CPU to not be annoying.
I don't have a gut feel for how much difference the Mamba arch makes to inference speed, nor how much quantisation is likely to ruin things, but as a rough comparison Mistral-7B at 4 bits per param is very usable on CPU.
The issue with using any local models for code generation comes up with doing so in a professional context: you lose any infrastructure the provider might have for avoiding regurgitation of copyright code, so there's a legal risk there. That might not be a barrier in your context, but in my day-to-day it certainly is.
I signed up when codestral was first available and put my payment details in. Been using it daily since then with continue.dev but my usage dashboard shows 0 tokens, and so far have not been billed for anything... Definitely not clear anywhere, but it seems to be free for now? Or some sort of free limit that I am not hitting.
Website codegpt.co also has a plugin for both VS Code and Intellij. When model becomes available in Ollama, you can connect plugin in VS code to local ollama instance.
If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.
(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)