I'm not sure what were the exact limits, but I definitely recall running into server errors with S3 and the OCI equivalent service — not technically 429s but enough to essentially limit throughput. SQS had 429s, I believe due to number of requests and not messages, but they only support batching at most 10.
I definitely wanted these to "just work" out of the box (and maybe I could've worked more with AWS/OCI given more time), as I wanted to focus on the actual search.
Seems like repopack only packs the repo. How do you apply the refactors back to the project? Is it something that Claude projects does automatically somehow?
Thank you! And thanks for raising that issue. I've pushed a fix that should hopefully mitigate this for you: it's possible to unselect, card images are hidden on mobile, and the invisible results area around a card (caused by the tallest card stretching the results area) should no longer intercept map touches. Let me know if it helps!
Thanks for the great pointers! I didn't get the time to look into hierarchical clustering unfortunately but it's on my TODO list. Your comment about making the map clearer is great and something I think there's a lot of low-hanging approaches for improving. Another thing for the TODO list :)
Thanks! Yeah I'd like to dive deeper into the sentiment aspect. As you say it'd be interesting to see some overview, instead of specific queries.
The negative sentiment stood out to me mostly because I was expecting a more "clear-cut" sentiment graph: largely neutral-positive, with spikes in the positive direction around positive posts and negative around negative posts. However, for almost all my queries, the sentiment was almost always negative. Even positive posts apparently attracted a lot of negativity (according to the model and my approach, both of which could be wrong). It's something I'd like to dive deeper into, perhaps in a future blog post.
The sentiment issue is a curious one to me. For example, a lot of humans I interact with that are not devs take my direct questioning or critical responses to be "negative" when there is no negative intent at all. Pointing out something doesn't work or anything that the dev community encounters on a daily basis isn't an immediate negative sentiment but just pointing out the issues. Is it a meme-like helicopter parent constantly doling out praise positive so that anything differing shows negativity? Not every piece of art needs to be hung on the fridge door, and providing constructive criticism for improvement is oh so often framed as negative. That does the world no favors.
Essentially, I'm not familiar with HuggingFace or any models in this regard. But if they are trained from the socials, then it seems skewed from the start to me.
Also, fully aware that this comment will probably be viewed as negative based on stated assumptions.
edit: reading further down the comments, clearly I'm not the first with these sentiments.
Speaking from experience, debate is easily misread as negative arguing by outsiders, even though all involved parties are enjoying challenging each other's ideas.
You may be right, a more tailored classifier for HN comments specifically may be more accurate. It'd be interesting to consider the classes: would it still be simply positive/negative? Perhaps constructive/unconstructive? Usefulness? Something more along the lines of HN guidelines?
Just one point of note : people are FAR more likely to respond and take to writing to something negative than positive. I don’t know the exact numbers but it just engages people more. People just don’t pick up the pen to write how good something is as much.
I did something related for my ChillTranslator project for translating spicy HN comments to calm variations which has a GGUF model that runs easily and quickly but it's early days. I did it with a much smaller set of data, using LLM's to make calm variations and an algo to pick the closest least spicy one to make the synthetic training data then used Phi 2. I used Detoxify then OpenAI's sentiment analysis is free, I use that to verify Detoxify has correctly identified spicy comments then generate a calm pair. I do worry that HN could implode / degrade if there is not able to be a good balance for the comments and posts that people come here for. Maybe I can use your sentiment data to mine faster and generate more pairs. I've only done an initial end-to-end test so far (which works!). The model, so far is not as high quality as I'd like but I've not used Phi 3 on it yet and I've only used a very small fine-tune dataset so far.
File is here though: https://huggingface.co/lukestanley/ChillTranslator
I've had no feedback from anyone on it though I did have a 404 in my Show HN post!
Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.
Posts written in sweet syrupy tones wouldn’t do well here, and jokes are in short supply or outright banned. Most people here also seem to be men. There’s always someone shooting you down. And after a while, you start to shoot back.
(Without wanting to sound negative or cynical) I don’t think it is, but maybe I haven’t been here long enough to notice. It skews towards technical and science and technology-minded people, which makes it automatically a bit ‘cynical’, but I feel like 95% of commenters are doing so at least in good faith. The same cannot be said of many comparable discussion forums or social media websites.
Jokes are also not banned; I see plenty on here. Low-effort ones and chains of unfunny wordplay or banter seem to be frowned upon though. And that makes it cleaner.
I've been here a hot minute and I agree with you. Lots of good faith. Lots of personal anecdotes presumably anchored in experience. Some jokes are really funny, just not reddit-style. Similarly, no slashdot quips generally, such as "first post" or "i, for one, welcome our new HN sentiment mapping robot overlords." Sometimes things get downvoted that shouldn't, but most of the flags I see are well deserved, and I vouch for ones that I think are not flag-worthy
I wonder how much of a persons impression of this is formed by their browsing habits.
As a parent comment mentions big threads can be a bit of a mess but usually only for the first couple of hours. Comments made in the spirit of HN tend to bubble up and off-topic, rude comments and bad jokes tend to percolate down over the course of hours. Also a number of threads that tend to spiral get manually detached which takes time to go clean up.
Someone who isn't somewhat familiar with how HN works that is consistently early to stories that attract a lot of comments is reading an almost entirely different site than someone who just catches up at the end of the day.
some of the more negative threads will get flagged and detached and by the end of the day a casual browse through the comments isn't even going to come across them. eg something about the situation in the middle east is going to attract a lot of attention.
I think it's the engineering mindset. You're always trying to figure out what's wrong with an idea, because you might be the poor bastard that ends up having to build it. Less costly all round if you can identify the flaw now, not halfway through sprint 7. After a while it bleeds into everything you do.
> Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.
I don't think this is particularly unique to HN. Anonymous forums tend to attract contrarian assholes. Perhaps this place is more, erm, poorly socially-adapted to the general population, but I don't see it as very far outside the norm outside of the average wealth of the posters.
Really? Mmm i think hn is a place with on avarage above intelligent people. People who understand that their opinion is not the only one. I rarely have issues with people here. Might be also because we are all in the same bubble here.
its so interesting that in Likert scale surveys, I tend to see huge positivity bias/agreement bias, but comments tend to be critical/negative. I think there is something related to the format of feedback that skews the graph in general.
On HN, my theory is that positivity is the upvotes, and negativity/criticality is the discussion.
Personally, my contribution to your effort is that I would love to see a tool that could do this analysis for me over a dataset/corpus of my choosing. The code is nice, but it is a bit beyond me to follow in your footsteps.
Great work! Would you consider adding support for search-via-url, e.g. https://hn.wilsonl.in/?q=sentiment+analysis. It would enable sharing and bookmarks of stable queries.
Lol what a typical comment for today's HN. Condescending ("just plain wrong") with a jab ("this isn't a hugbox") placed in just to remind you that not only are you perceived to be wrong but you've provoked anger. No proof to provoke the jab, no feedback to help fix what you perceive as wrong sentiment analysis. Just thoughtless condescension and anger. Why is the sentiment wrong? Is this a data analysis trap the OP fell into? Nah let's insult the OP instead.
In my experience having run a bunch of different sentiment models on HN comments, HN comments tend to place around neutral to slightly negative as a whole, even when I perceive the thread to be okay. However I've noticed a huge bump in negative sentiment on large HN threads. I generally find that absolute sentiment doesn't work in most corpuses because the model reflects its training set's sentiment labels. I generally find relative sentiment to be a lot more useful. I have yet to do a temporal sentiment analysis on HN but I have a suspicion that it's gotten more negative over time. I agree with another poster that I think HN needs to be careful to not become so negative that it just becomes an anger echo.
Relative sentiment on this site between topics is something I've done and the obvious results show. Crypto threads are by-and-large negative, most political and news related threads are also highly negative.
Cynicism is perceived as more intelligent [0]. I personally find the HN brand of discussion to be difficult to bs my way into. But no matter your level of competency you can always find something to criticize and feel you've contributed. I wonder if academia or even "more intelligent" discussion in general would be counted as more negative.
As someone who is not an academic myself, but likes to listen to podcasts where academics discuss issues with each other, I often find that the conversations feel contentious, and sometimes they are, but the vast majority of the time the academics themselves feel like they're having a perfectly cordial and productive conversation. So I do think there is something to the idea that academic discussion comes across as being negative.
Sure, there's the 20% of comments that are outright rude, or tie everything back to their pet grievance (job satisfaction, government surveillance, the existence of JS).
But beyond that, the technical conversation has a negative, critical edge. A lot of comments come from the angle "You did something wrong by...", or only reply to correct.
There are still golden comments, and most personal anecdotes are treated respectfully, but it makes for an intimidating environment.
I think your curl approach would work just as fine if not better. My instinct was to reach for Node.js out of familiarity, but curl is fast and, given the IDs are sequential, something like `parallel curl ::: $(seq 0 $max_id)` would be pretty simple and fast. I did end up needing more logic though so Node.js did ultimately come in handy.
As for the Arrow file, I'm not sure unfortunately. I imagine there are some difficulties because the format is columnar, so it probably wants a batch of rows (when writing) instead of one item at a time.
I definitely wanted these to "just work" out of the box (and maybe I could've worked more with AWS/OCI given more time), as I wanted to focus on the actual search.