Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
oofbaroomf
10 days ago
|
parent
|
context
|
favorite
| on:
Claude 4
Interesting how Sonnet has a higher SWE-bench Verified score than Opus. Maybe says something about scaling laws.
somebodythere
10 days ago
|
next
[–]
My guess is that they did RLVR post-training for SWE tasks, and a smaller model can undergo more RL steps for the same amount of computation.
reply
benoittravers
10 days ago
|
prev
[–]
Do you have the link to that benchmark? Can’t see where Sonnet is highlighted.
reply
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: