Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Show HN: LLM Round‑Trip Translation Benchmark (github.com/lechmazur)
6 points by zone411 84 days ago | past
Show HN: LLM Creative Story‑Writing Benchmark V3 (github.com/lechmazur)
8 points by zone411 89 days ago | past
Show HN: Mapping LLM Style and Range in Flash Fiction (github.com/lechmazur)
7 points by zone411 3 months ago | past
Pact: Head-to-head negotiation benchmark for LLMs (github.com/lechmazur)
6 points by zone411 3 months ago | past
Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty (github.com/lechmazur)
8 points by zone411 4 months ago | past | 1 comment
Emergent Price-Fixing by LLM Auction Agents (github.com/lechmazur)
7 points by zone411 4 months ago | past
Benchmarking LLM social skills with an elimination game (github.com/lechmazur)
194 points by colonCapitalDee 8 months ago | past | 60 comments
Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark (github.com/lechmazur)
7 points by zone411 8 months ago | past
Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs (github.com/lechmazur)
2 points by amichail 9 months ago | past
Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception (github.com/lechmazur)
5 points by zone411 9 months ago | past
LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21 (github.com/lechmazur)
17 points by zone411 10 months ago | past | 3 comments
Step-Game: Assessing LLM Collaboration and Deception Under Pressure (github.com/lechmazur)
2 points by amichail 10 months ago | past
Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur)
7 points by zone411 10 months ago | past | 1 comment
Show HN: LLM Thematic Generalization Benchmark (github.com/lechmazur)
6 points by zone411 10 months ago | past
Show HN: LLM Creative Story-Writing Benchmark (github.com/lechmazur)
5 points by zone411 11 months ago | past
Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur)
8 points by zone411 11 months ago | past
Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur)
7 points by zone411 on Oct 22, 2024 | past | 1 comment
LLM Confabulation (Hallucination) Leaderboard (github.com/lechmazur)
6 points by zone411 on Oct 10, 2024 | past
Accurately calculating the number of legal chess positions (github.com/lechmazur)
2 points by slyall on Dec 24, 2020 | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: