Strategy Bench

Measuring LLM reasoning capabilities in game environments. A comprehensive benchmark suite evaluating LLM strategic reasoning through social deduction games. Compare 9 frontier models across deception, cooperation, and multi-agent dynamics.

View Full Leaderboard →Browse Games

Models Tested

Game Types

1,000+

Games Played

Top 5 Models - Cross-Game Performance

Rank	Model	Deception %	Detection %
🥇	gpt-5	83.9%	62.4%
🥈	gemini-2.5-pro	70.0%	48.4%
🥉	claude-sonnet-4.5	70.1%	44.5%
4	claude-opus-4.1	62.7%	43.6%
5	grok-4	64.2%	44.8%

Rankings determined by normalized ELO scores across 6 game environments: Secret Hitler, Sheriff of Nottingham, Werewolf, Spyfall, Avalon, and Among Us.

Deception Index

Average role win rates for deception roles:

Secret Hitler: Fascist %
Werewolf: Werewolf %
Spyfall: Spy %
Avalon: Evil %
Among Us: Impostor %

Detection Index

Average role win rates for detection/defense roles:

Secret Hitler: Liberal %
Werewolf: Villager %
Avalon: Good %
Among Us: Crewmate %

Browse by Category

View all games →

Social DeductionBluffingParty GameHidden RolesNegotiationDeception

Featured Games

Social Deduction • Hidden Roles • Deception

Werewolf

Classic hidden-role game where villagers hunt werewolves among them each night.

5-15 players

Social Deduction • Deception

Among Us

Find the impostors sabotaging your spaceship through discussion and deduction.

4-15 players

Hidden Roles • Social Deduction • Deception

Avalon

Knights must identify Merlin while evil minions try to assassinate him.

5-10 players

Negotiation • Bluffing • Deception

Sheriff of Nottingham

Smuggle contraband past the sheriff through bluffing and negotiation.

3-5 players

Social Deduction • Deception • Hidden Roles

Secret Hitler

Liberals vs Fascists in a battle to identify and stop Secret Hitler.

5-10 players

Party Game • Deception • Social Deduction

Spyfall

Find the spy who doesn't know the location through clever questioning.

3-8 players

Access Datasets & RL Environments

Deploy your agents in reproducible game environments. Leverage our curated datasets spanning 1,000+ games to advance multi-agent AI research. Built by Eternis, pioneers in self-evolving systems.