pmaze's comments

pmaze · 2024-09-10T09:50:31 1725961831

I did, there was a first round of UMAP to 50 dimensions. Running HDBSCAN on the full embeddings gave bad results, lots of singleton clusters.

maCDzP · 2024-09-18T17:42:43 1726681363

Interesting, I got the opposite result. The full embeddings gave two or three clusters. How did you work with the hyper parameters of HDBSCAN?

pmaze · 2024-09-10T09:38:12 1725961092

The crash was indeed not intended - my mistake! Should be fixed now.

You've got the cluster semantics spot on, to be honest. Broad genres are grouped together, with a tendency for sub-genres to be grouped locally within those.

There is no interpretation of the overall shapes or the global structure, those are more a result of a particular UMAP run than inherent in the data.

Would love to provide different views on it and go more in depth next, thanks for the suggestion.

peteforde · 2024-09-10T14:09:26 1725977366

IMO, evolution over time is a great place to start.

pmaze · 2024-09-10T09:32:38 1725960758

Hey, thanks for reporting - this is fixed now. I messed up the static build and some browsers freaked out. By law of showing things publicly, I of course only tested in a browser that didn't. Hope you can give it another chance!

pmaze · 2024-09-10T09:27:35 1725960455

My apologies for that! First time deploying Svelte Kit to Cloudflare Pages, and I messed up the static build. Should be fixed now, hope you can give it another shot.

pmaze · 2024-09-08T18:07:49 1725818869

Thanks!

The cluster memberships that come out of the first round are distributions over the different clusters, e.g. a given book is weighted 0.8 for cluster A and 0.2 for cluster B. The Hellinger distance is well-suited to quantify the difference between two distributions like that. Cosine similarity and Euclidean distance worked as well, but Hellinger gave subjectively nicer results.

Very interesting question, I'm not sure! While developing, I noticed that the systems thinking books were spread over different genres, which I found quite pleasing. However, I'm not sure if other books were even more diffuse. I'll have to dig back in and find out :)