Submissions from github.com/lechmazur

		Show HN: LLM Round‑Trip Translation Benchmark (github.com/lechmazur)
		6 points by zone411 84 days ago \| past
		Show HN: LLM Creative Story‑Writing Benchmark V3 (github.com/lechmazur)
		8 points by zone411 89 days ago \| past
		Show HN: Mapping LLM Style and Range in Flash Fiction (github.com/lechmazur)
		7 points by zone411 3 months ago \| past
		Pact: Head-to-head negotiation benchmark for LLMs (github.com/lechmazur)
		6 points by zone411 3 months ago \| past
		Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty (github.com/lechmazur)
		8 points by zone411 4 months ago \| past \| 1 comment
		Emergent Price-Fixing by LLM Auction Agents (github.com/lechmazur)
		7 points by zone411 4 months ago \| past
		Benchmarking LLM social skills with an elimination game (github.com/lechmazur)
		194 points by colonCapitalDee 8 months ago \| past \| 60 comments
		Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark (github.com/lechmazur)
		7 points by zone411 8 months ago \| past
		Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs (github.com/lechmazur)
		2 points by amichail 9 months ago \| past
		Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception (github.com/lechmazur)
		5 points by zone411 9 months ago \| past
		LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21 (github.com/lechmazur)
		17 points by zone411 10 months ago \| past \| 3 comments
		Step-Game: Assessing LLM Collaboration and Deception Under Pressure (github.com/lechmazur)
		2 points by amichail 10 months ago \| past
		Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur)
		7 points by zone411 10 months ago \| past \| 1 comment
		Show HN: LLM Thematic Generalization Benchmark (github.com/lechmazur)
		6 points by zone411 10 months ago \| past
		Show HN: LLM Creative Story-Writing Benchmark (github.com/lechmazur)
		5 points by zone411 11 months ago \| past
		Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur)
		8 points by zone411 11 months ago \| past
		Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur)
		7 points by zone411 on Oct 22, 2024 \| past \| 1 comment
		LLM Confabulation (Hallucination) Leaderboard (github.com/lechmazur)
		6 points by zone411 on Oct 10, 2024 \| past
		Accurately calculating the number of legal chess positions (github.com/lechmazur)
		2 points by slyall on Dec 24, 2020 \| past