More

wilsonzlin · 2025-08-13T10:02:36 1755079356

I'm not sure what were the exact limits, but I definitely recall running into server errors with S3 and the OCI equivalent service — not technically 429s but enough to essentially limit throughput. SQS had 429s, I believe due to number of requests and not messages, but they only support batching at most 10.

I definitely wanted these to "just work" out of the box (and maybe I could've worked more with AWS/OCI given more time), as I wanted to focus on the actual search.

demarq · 2025-08-13T12:00:52 1755086452

Those are reasonable expectations. I’m very impressed with how it all worked out.

wilsonzlin · on Sept 10, 2024

Similar project: https://github.com/yamadashy/repopack

acoretchi · on Sept 10, 2024

Repopack with Claude projects has been a game changer for me on repository-wide refactors.

wasyl · on Sept 10, 2024

Seems like repopack only packs the repo. How do you apply the refactors back to the project? Is it something that Claude projects does automatically somehow?

krudnicki · on Sept 10, 2024

for me too

wilsonzlin · on May 11, 2024

Thanks for the heads up, just fixed this.

wilsonzlin · on May 10, 2024

Thank you! And thanks for raising that issue. I've pushed a fix that should hopefully mitigate this for you: it's possible to unselect, card images are hidden on mobile, and the invisible results area around a card (caused by the tallest card stretching the results area) should no longer intercept map touches. Let me know if it helps!

wilsonzlin · on May 9, 2024

Thanks for the great pointers! I didn't get the time to look into hierarchical clustering unfortunately but it's on my TODO list. Your comment about making the map clearer is great and something I think there's a lot of low-hanging approaches for improving. Another thing for the TODO list :)

wilsonzlin · on May 9, 2024

Thanks! Yeah I'd like to dive deeper into the sentiment aspect. As you say it'd be interesting to see some overview, instead of specific queries.

The negative sentiment stood out to me mostly because I was expecting a more "clear-cut" sentiment graph: largely neutral-positive, with spikes in the positive direction around positive posts and negative around negative posts. However, for almost all my queries, the sentiment was almost always negative. Even positive posts apparently attracted a lot of negativity (according to the model and my approach, both of which could be wrong). It's something I'd like to dive deeper into, perhaps in a future blog post.

dylan604 · on May 9, 2024

The sentiment issue is a curious one to me. For example, a lot of humans I interact with that are not devs take my direct questioning or critical responses to be "negative" when there is no negative intent at all. Pointing out something doesn't work or anything that the dev community encounters on a daily basis isn't an immediate negative sentiment but just pointing out the issues. Is it a meme-like helicopter parent constantly doling out praise positive so that anything differing shows negativity? Not every piece of art needs to be hung on the fridge door, and providing constructive criticism for improvement is oh so often framed as negative. That does the world no favors.

Essentially, I'm not familiar with HuggingFace or any models in this regard. But if they are trained from the socials, then it seems skewed from the start to me.

Also, fully aware that this comment will probably be viewed as negative based on stated assumptions.

edit: reading further down the comments, clearly I'm not the first with these sentiments.

uyzstvqs · on May 10, 2024

Speaking from experience, debate is easily misread as negative arguing by outsiders, even though all involved parties are enjoying challenging each other's ideas.

wilsonzlin · on May 10, 2024

You may be right, a more tailored classifier for HN comments specifically may be more accurate. It'd be interesting to consider the classes: would it still be simply positive/negative? Perhaps constructive/unconstructive? Usefulness? Something more along the lines of HN guidelines?

prox · on May 10, 2024

Just one point of note : people are FAR more likely to respond and take to writing to something negative than positive. I don’t know the exact numbers but it just engages people more. People just don’t pick up the pen to write how good something is as much.

flawsofar · on May 9, 2024

Every helicopter gets a trophy

dylan604 · on May 9, 2024

wait, the parents get a trophy?

luke-stanley · on May 9, 2024

I did something related for my ChillTranslator project for translating spicy HN comments to calm variations which has a GGUF model that runs easily and quickly but it's early days. I did it with a much smaller set of data, using LLM's to make calm variations and an algo to pick the closest least spicy one to make the synthetic training data then used Phi 2. I used Detoxify then OpenAI's sentiment analysis is free, I use that to verify Detoxify has correctly identified spicy comments then generate a calm pair. I do worry that HN could implode / degrade if there is not able to be a good balance for the comments and posts that people come here for. Maybe I can use your sentiment data to mine faster and generate more pairs. I've only done an initial end-to-end test so far (which works!). The model, so far is not as high quality as I'd like but I've not used Phi 3 on it yet and I've only used a very small fine-tune dataset so far. File is here though: https://huggingface.co/lukestanley/ChillTranslator I've had no feedback from anyone on it though I did have a 404 in my Show HN post!

deadbabe · on May 9, 2024

Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.

Posts written in sweet syrupy tones wouldn’t do well here, and jokes are in short supply or outright banned. Most people here also seem to be men. There’s always someone shooting you down. And after a while, you start to shoot back.

xanderlewis · on May 9, 2024

(Without wanting to sound negative or cynical) I don’t think it is, but maybe I haven’t been here long enough to notice. It skews towards technical and science and technology-minded people, which makes it automatically a bit ‘cynical’, but I feel like 95% of commenters are doing so at least in good faith. The same cannot be said of many comparable discussion forums or social media websites.

Jokes are also not banned; I see plenty on here. Low-effort ones and chains of unfunny wordplay or banter seem to be frowned upon though. And that makes it cleaner.

sethammons · on May 9, 2024

I've been here a hot minute and I agree with you. Lots of good faith. Lots of personal anecdotes presumably anchored in experience. Some jokes are really funny, just not reddit-style. Similarly, no slashdot quips generally, such as "first post" or "i, for one, welcome our new HN sentiment mapping robot overlords." Sometimes things get downvoted that shouldn't, but most of the flags I see are well deserved, and I vouch for ones that I think are not flag-worthy

goles · on May 9, 2024

I wonder how much of a persons impression of this is formed by their browsing habits.

As a parent comment mentions big threads can be a bit of a mess but usually only for the first couple of hours. Comments made in the spirit of HN tend to bubble up and off-topic, rude comments and bad jokes tend to percolate down over the course of hours. Also a number of threads that tend to spiral get manually detached which takes time to go clean up.

Someone who isn't somewhat familiar with how HN works that is consistently early to stories that attract a lot of comments is reading an almost entirely different site than someone who just catches up at the end of the day.

fragmede · on May 9, 2024

some of the more negative threads will get flagged and detached and by the end of the day a casual browse through the comments isn't even going to come across them. eg something about the situation in the middle east is going to attract a lot of attention.

flir · on May 9, 2024

I think it's the engineering mindset. You're always trying to figure out what's wrong with an idea, because you might be the poor bastard that ends up having to build it. Less costly all round if you can identify the flaw now, not halfway through sprint 7. After a while it bleeds into everything you do.

chiefalchemist · on May 10, 2024

> Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.

Sure, sometimes. But usually it's

Truth seeking > group thinking

There's a fine line between critical and cynical. Sometimes that line gets crossed. Sometimes the ambiguity of text-only comms clouds the water.

darby_eight · on May 9, 2024

> Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.

I don't think this is particularly unique to HN. Anonymous forums tend to attract contrarian assholes. Perhaps this place is more, erm, poorly socially-adapted to the general population, but I don't see it as very far outside the norm outside of the average wealth of the posters.

holoduke · on May 9, 2024

Really? Mmm i think hn is a place with on avarage above intelligent people. People who understand that their opinion is not the only one. I rarely have issues with people here. Might be also because we are all in the same bubble here.

abakker · on May 9, 2024

its so interesting that in Likert scale surveys, I tend to see huge positivity bias/agreement bias, but comments tend to be critical/negative. I think there is something related to the format of feedback that skews the graph in general.

On HN, my theory is that positivity is the upvotes, and negativity/criticality is the discussion.

Personally, my contribution to your effort is that I would love to see a tool that could do this analysis for me over a dataset/corpus of my choosing. The code is nice, but it is a bit beyond me to follow in your footsteps.

walterbell · on May 9, 2024

Great work! Would you consider adding support for search-via-url, e.g. https://hn.wilsonl.in/?q=sentiment+analysis. It would enable sharing and bookmarks of stable queries.

wilsonzlin · on May 9, 2024

Thanks for the suggestion, I've just added the feature:

https://hn.wilsonl.in/s/sentiment%20analysis

al_hag · on May 10, 2024

It will be a deep dive into the most essential of HN staples, the nitpick

beeboobaa3 · on May 9, 2024

[flagged]

Karrot_Kream · on May 9, 2024

Lol what a typical comment for today's HN. Condescending ("just plain wrong") with a jab ("this isn't a hugbox") placed in just to remind you that not only are you perceived to be wrong but you've provoked anger. No proof to provoke the jab, no feedback to help fix what you perceive as wrong sentiment analysis. Just thoughtless condescension and anger. Why is the sentiment wrong? Is this a data analysis trap the OP fell into? Nah let's insult the OP instead.

In my experience having run a bunch of different sentiment models on HN comments, HN comments tend to place around neutral to slightly negative as a whole, even when I perceive the thread to be okay. However I've noticed a huge bump in negative sentiment on large HN threads. I generally find that absolute sentiment doesn't work in most corpuses because the model reflects its training set's sentiment labels. I generally find relative sentiment to be a lot more useful. I have yet to do a temporal sentiment analysis on HN but I have a suspicion that it's gotten more negative over time. I agree with another poster that I think HN needs to be careful to not become so negative that it just becomes an anger echo.

Relative sentiment on this site between topics is something I've done and the obvious results show. Crypto threads are by-and-large negative, most political and news related threads are also highly negative.

the_sleaze_ · on May 9, 2024

Cynicism is perceived as more intelligent [0]. I personally find the HN brand of discussion to be difficult to bs my way into. But no matter your level of competency you can always find something to criticize and feel you've contributed. I wonder if academia or even "more intelligent" discussion in general would be counted as more negative.

https://journals.sagepub.com/doi/pdf/10.1177/014616721878319...

importantbrian · on May 9, 2024

As someone who is not an academic myself, but likes to listen to podcasts where academics discuss issues with each other, I often find that the conversations feel contentious, and sometimes they are, but the vast majority of the time the academics themselves feel like they're having a perfectly cordial and productive conversation. So I do think there is something to the idea that academic discussion comes across as being negative.

sdwr · on May 9, 2024

HN definitely has a negative valence.

Sure, there's the 20% of comments that are outright rude, or tie everything back to their pet grievance (job satisfaction, government surveillance, the existence of JS).

But beyond that, the technical conversation has a negative, critical edge. A lot of comments come from the angle "You did something wrong by...", or only reply to correct.

There are still golden comments, and most personal anecdotes are treated respectfully, but it makes for an intimidating environment.

beeboobaa3 · on May 9, 2024

Whoosh, I was making a point by styling my comment in a way that would be perceived as negative by sentiment analysis.

Good job doing a whole psychoanalysis based on what's basically a joke, though.

Karrot_Kream · on May 9, 2024

Heh did I miss the joke? That was a whoops indeed! Sentiment is hard on the internet ;)

> Good job doing a whole psychoanalysis based on what's basically a joke, though.

Guess there's still some work to be done on that positive sentiment replying eh? :)

beeboobaa3 · on May 9, 2024

That one was intentional ;)

wilsonzlin · on May 9, 2024

Thanks! Do you mean within the sentiment/popularity analysis graph? Or the points and topics within the map?

ashu1461 · on May 10, 2024

Points and topics within the map.

wilsonzlin · on May 9, 2024

Thank you for the kind words!

wilsonzlin · on May 9, 2024

Thanks for the kind words, and raising that problem --- I've added it as an issue to fix.

Thanks for sharing that article, it was an interesting read. It was cool how deep the analysis went with a few simple statistical methods.

wilsonzlin · on May 9, 2024

I think your curl approach would work just as fine if not better. My instinct was to reach for Node.js out of familiarity, but curl is fast and, given the IDs are sequential, something like `parallel curl ::: $(seq 0 $max_id)` would be pretty simple and fast. I did end up needing more logic though so Node.js did ultimately come in handy.

As for the Arrow file, I'm not sure unfortunately. I imagine there are some difficulties because the format is columnar, so it probably wants a batch of rows (when writing) instead of one item at a time.