Hacker News new | past | comments | ask | show | jobs | submit login

That isn't a very good example. The vectors for each token are randomly initialized with each element taken from the normal distribution. After training, similar words will have some cosine similarity, but almost never as much cosine similarity as [1,2,3,4] and [2,3,4,5].



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: