Hacker News new | past | comments | ask | show | jobs | submit | pmaze's comments login

I did, there was a first round of UMAP to 50 dimensions. Running HDBSCAN on the full embeddings gave bad results, lots of singleton clusters.


Interesting, I got the opposite result. The full embeddings gave two or three clusters. How did you work with the hyper parameters of HDBSCAN?


The crash was indeed not intended - my mistake! Should be fixed now.

You've got the cluster semantics spot on, to be honest. Broad genres are grouped together, with a tendency for sub-genres to be grouped locally within those.

There is no interpretation of the overall shapes or the global structure, those are more a result of a particular UMAP run than inherent in the data.

Would love to provide different views on it and go more in depth next, thanks for the suggestion.


IMO, evolution over time is a great place to start.


Hey, thanks for reporting - this is fixed now. I messed up the static build and some browsers freaked out. By law of showing things publicly, I of course only tested in a browser that didn't. Hope you can give it another chance!


My apologies for that! First time deploying Svelte Kit to Cloudflare Pages, and I messed up the static build. Should be fixed now, hope you can give it another shot.


Thanks!

The cluster memberships that come out of the first round are distributions over the different clusters, e.g. a given book is weighted 0.8 for cluster A and 0.2 for cluster B. The Hellinger distance is well-suited to quantify the difference between two distributions like that. Cosine similarity and Euclidean distance worked as well, but Hellinger gave subjectively nicer results.

Very interesting question, I'm not sure! While developing, I noticed that the systems thinking books were spread over different genres, which I found quite pleasing. However, I'm not sure if other books were even more diffuse. I'll have to dig back in and find out :)


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: