| | Show HN: LLM Round‑Trip Translation Benchmark (github.com/lechmazur) |
| 6 points by zone411 84 days ago | past |
|
| | Show HN: LLM Creative Story‑Writing Benchmark V3 (github.com/lechmazur) |
| 8 points by zone411 89 days ago | past |
|
| | Show HN: Mapping LLM Style and Range in Flash Fiction (github.com/lechmazur) |
| 7 points by zone411 3 months ago | past |
|
| | Pact: Head-to-head negotiation benchmark for LLMs (github.com/lechmazur) |
| 6 points by zone411 3 months ago | past |
|
| | Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty (github.com/lechmazur) |
| 8 points by zone411 4 months ago | past | 1 comment |
|
| | Emergent Price-Fixing by LLM Auction Agents (github.com/lechmazur) |
| 7 points by zone411 4 months ago | past |
|
| | Benchmarking LLM social skills with an elimination game (github.com/lechmazur) |
| 194 points by colonCapitalDee 8 months ago | past | 60 comments |
|
| | Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark (github.com/lechmazur) |
| 7 points by zone411 8 months ago | past |
|
| | Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs (github.com/lechmazur) |
| 2 points by amichail 9 months ago | past |
|
| | Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception (github.com/lechmazur) |
| 5 points by zone411 9 months ago | past |
|
| | LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21 (github.com/lechmazur) |
| 17 points by zone411 10 months ago | past | 3 comments |
|
| | Step-Game: Assessing LLM Collaboration and Deception Under Pressure (github.com/lechmazur) |
| 2 points by amichail 10 months ago | past |
|
| | Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur) |
| 7 points by zone411 10 months ago | past | 1 comment |
|
| | Show HN: LLM Thematic Generalization Benchmark (github.com/lechmazur) |
| 6 points by zone411 10 months ago | past |
|
| | Show HN: LLM Creative Story-Writing Benchmark (github.com/lechmazur) |
| 5 points by zone411 11 months ago | past |
|
| | Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur) |
| 8 points by zone411 11 months ago | past |
|
| | Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur) |
| 7 points by zone411 on Oct 22, 2024 | past | 1 comment |
|
| | LLM Confabulation (Hallucination) Leaderboard (github.com/lechmazur) |
| 6 points by zone411 on Oct 10, 2024 | past |
|
| | Accurately calculating the number of legal chess positions (github.com/lechmazur) |
| 2 points by slyall on Dec 24, 2020 | past |
|