Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Mistral has published large language models, not embedding models? sgrep uses Google's Word2Vec to generate embeddings of the corpus and perform similarity searches on it, given a user query.


No I got that I asked because wouldn’t embedding generated by fine tuned transformer based LLMs be more context aware? Idk much about the internals so apologies if this was a dumb thing to say


embeddings come in handy to augment LLMs [0], but as you suspect, some try LLMs themselves as an outright embedding model with varying degrees of success: https://www.reddit.com/r/LocalLLaMA/comments/12y3stx/embeddi... / https://huggingface.co/spaces/mteb/leaderboard

[0] https://simonwillison.net/2023/Oct/23/embeddings/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: