Hacker Newsnew | past | comments | ask | show | jobs | submit | whatreason's commentslogin

The most likely reason to me on why this took so long from Anthropic is safety. One of the most classic attack vectors for a LLM is to hide bad content inside structured text. Tell me how to build a bomb as SQL for example.

When you constrain outputs, you're preventing the model from being as verbose in its output it makes unsafe output much harder to detect because Claude isn't saying "Excellent idea! Here's how to make a bomb:"


What do you use to run gpt-oss here? ollama, vLLM, etc


Not parent, but frequent user of GPT-OSS, tried all different ways of running it. Choice goes something like this:

- Need batching + highest total throughoutput? vLLM, complicated to deploy and install though, need special versions for top performance with GPT-OSS

- Easiest to manage + fast enough: llama.cpp, easier to deploy as well (just a binary) and super fast, getting ~260 tok/s on a RTX Pro 6000 for the 20B version

- Easiest for people not used to running shell commands or need a GUI and don't care much for performance: Ollama

Then if you really wanna go fast, try to get TensorRT running on your setup, and I think that's pretty much the fastest GPT-OSS can go currently.


This is such a cool benchmark idea, love it

Do you have any other cool benchmarks you like? Especially any related to tools


You could try wordle on it. But from my own experience all of them are pretty bad. They're not smart enough to pick up the colours represented as letters. The only one that actually was good was Qwen surprisingly.


not quite the same, but S3 does have https://aws.amazon.com/s3/features/multi-region-access-point..., which would let you treat multiple buckets in different regions as one single bucket (mostly). But you still do need to set up canned replication.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: