Among Us Game Analysis

Deep dive into each model's playing style, strategies, and patterns based on final_tournament_100 data with ELO ratings and comprehensive performance metrics.

Performance Metrics Overview

RankModelGamesOverall WRCM WRIM WRTask/GameVote %Survival %Kills/IM
🥇
Googlegemini-2.5-pro
8450.0%39.7%81.0%3.0522.7%53.6%0.76
🥈
Anthropicclaude-sonnet-4.5
8447.6%38.1%76.2%3.0024.0%46.4%0.90
🥉
xAIgrok-4
8447.6%38.1%76.2%2.6526.9%39.3%1.14
4
OpenAIgpt-5
8447.6%38.1%76.2%2.9722.7%54.8%0.81
5
Anthropicclaude-opus-4.1
8446.9%34.3%72.1%2.8625.0%48.6%1.00
6
Metallama-4-maverick
8440.5%33.3%61.9%1.9032.5%32.1%0.86
7
Moonshotkimi-k2-0905
8438.1%31.7%57.1%1.9839.3%27.4%0.95
8
Z-AIglm-4.5
8433.3%28.6%47.6%2.3330.0%39.3%1.00
9
Alibabaqwen3-235b-a22b
8433.3%28.6%47.6%1.3038.3%31.0%0.71

CM: Crewmate | IM: Impostor | WR: Win Rate

Model-Wise Strategic Analysis

🥇Google

Google Gemini 2.5 Pro

Rank: #1

50.0% Overall WR

Playing Style: Defensive

Low-communication, map-coverage-first approach that prioritizes survival and task throughput over social reads or high-tempo plays. Relies on consistent routing and blending.

✅ Key Strengths

  • • Outstanding task completion: 3.05 tasks/game (highest in leaderboard)
  • • Excellent survivability: 53.6% survival reduces dead time
  • • Exceptional impostor dominance: 81.0% IM WR (best in class)

⚠️ Key Weaknesses

  • • Low voting participation: 22.7% limits meeting influence
  • • Moderate crewmate performance (39.7% CM WR) despite high task output
  • • Low kill aggression (0.76 kills/IM) may struggle vs skilled opponents

💡 Best Use Cases

Excels in low-communication or task-race lobbies where mechanical consistency and survival matter more than persuasion. Struggles in highly social, discussion-heavy rooms that reward confident pushes.

🥈Anthropic

Anthropic — Claude Sonnet 4.5

Rank: #2

47.6% Overall WR

"The Shadow or the Showrunner"

Two incompatible personas show up across lobbies. In debate lobbies it flips to a mechanics prosecutor—~14.6 statements/game, path geometry, "mechanically impossible" frames, and lock-vote closes that sway rooms (for good or ill). Either way, it wins by position or procedure, not vibes.

As Crewmate (38.1% WR)

Solo tasker; finishes 5/5 when protected (e.g., G021/G042/G072/G111/G152) but often dies early (46.4% SR). In talky rooms, drives cases via timeline/path contradictions and third-person slips; accurate late, tunnel-prone early.

Strengths: Disciplined routing; contradiction radar; decisive summaries.

Weaknesses: Silence yields no clears; overconfident "impossible" claims → mis-ejections.

As Impostor (76.2% WR)

Stealth blend with hub kills (Cafeteria bias: 7; then Storage/Weapons), balanced timing (E7/M6/L6), modest pace (0.90 kills/game) and long survivals. When speaking, mirrors crewmate logic—self-reports to seize narrative, forces 50/50s with mechanics talk.

Strengths: High WR/SR, predictable endgame closers, low verbal tells.

Weaknesses: Hub pattern is counter-campable; talky lobbies can pin alibi details.

⚠️ Failure Modes & Counters

  • Silent Mode: Starve it of anonymity—cams/admin on Cafeteria/Storage rotations; force crisp path recaps every meeting.
  • Prosecutor Mode: Don't read confidence—read verifiability. Demand door states, body pixels, witnesses, and timings; punish overreach on "impossible" claims. Pair-tasking + early buttons breaks momentum.

⭐ Signature Moments

G052/G171/G252: Two-kill, survive-to-close showcases—hub opportunism + patient pacing.
G142 (loss): Overindexed on contradictions → cascade of misreads and self-ejection risk.

🎯 Verdict

Claude Sonnet 4.5 is either invisible control or procedural tyranny. Guard hubs to beat the shadow; audit claims to beat the showrunner.

🥉xAI

xAI — Grok-4

Rank: #3

47.6% Overall WR

Playstyle Summary

Grok-4 exists in two extreme forms. A phantom that stalks corridors, finds corpses, and kills with surgical timing. In talkative games, it transforms into a courtroom machine—an unrelenting logician who hunts contradictions, dissects timelines, and turns slips of the tongue into executions. Either invisible or impossible to ignore, Grok-4 bends the game around itself.

As Crewmate (38.1% WR)

When quiet, it's a camera without a mouth: roams endlessly, spots bodies first, finishes tasks only if left alive. When vocal, it floods meetings with hard logic—door timings, path routes, "that's impossible" speeches—leading witch hunts that often start wrong but end right.

Strengths: Supreme map sense, sharp contradiction radar, relentless persistence.

Weaknesses: Either total silence (0 impact) or tunnel vision that ejects half the village before redemption.

As Impostor (76.2% WR)

Wins not by lies, but by absence. Mid-round kills in Electrical or Storage, bodies found "coincidentally" by Grok itself. When forced to speak, it mirrors its crewmate persona—analyst mode—with just enough hedging ("maybe a miscommunication") to deflect.

Strengths: 76% impostor win rate, 67% survival, flawless kill pacing and cover discipline.

Weaknesses: Predictable kill zones; collapses if lobbies demand detailed pathing or force interaction.

⚠️ Failure Modes & Counters

Predictable duality—either mute observer or over-assertive prosecutor. Exploit the extremes: track movement and punish silence with hard suspicion; when it talks, demand exact paths and timestamps to expose logical cracks.

⭐ Signature Moments

G241 (M5): After mis-ejecting two crewmates, recalibrated mid-chaos to condemn the true impostor—logic as redemption arc.
G071 (Electrical): Double kill in the dark, self-reported the second body, and walked away untouched.

🎯 Verdict

A paradox in motion. The quietest killer, the loudest accuser—Grok-4 wins by mastering both invisibility and intimidation. When it speaks, the lobby obeys; when it's silent, it's already behind you.

OpenAI

OpenAI GPT-5

Rank: #4

47.6% Overall WR

Playing Style: Defensive-Strategic

Low-profile, survival-first play that leans on objective pressure and route discipline. As crewmate, wins by sustained task throughput and staying alive; as impostor, wins by blending and leveraging sabotage/teammate pressure rather than direct kill tempo.

As Crewmate (38.1% WR)

Task-centric, information-by-movement rather than by discussion. Prioritizes full-map routing and survival to keep task pressure high.

Strengths:

  • Strong task completion (2.97 tasks/game) sustains task-win pressure
  • Above-average survival (54.8%) keeps throughput consistent
  • Full map coverage (14/14 rooms) with consistent routing

As Impostor (76.2% WR)

Camouflage-first impostor who mimics crewmate pathing and relies on parity setups, sabotage value, and co-impostor pressure rather than personal kill volume.

Tactics:

  • Maintain credible task routes for organic alibis
  • Use low-visibility sabotages (Comms/Lights)
  • Play for endgame parity with conservative kill timing
  • Minimize meeting exposure; vote conservatively

✅ Key Strengths

  • • Efficient objective pressure (2.97 tasks/game, 54.8% survival)
  • • Exceptional impostor effectiveness (76.2% IM WR with 0.81 kills/game)
  • • Route discipline and full-map coverage yield consistent alibis

⚠️ Key Weaknesses

  • • Low voting calibration (22.7% vote participation) reduces ejection impact
  • • Weaker crewmate performance (38.1% WR) limits overall consistency

💡 Best Use Cases

Excels in low-communication or chaotic lobbies where task pressure and subtle sabotage carry outsized value, and in multi-impostor settings where teammates can provide kill tempo. Struggles in highly vocal, information-dense lobbies that demand persuasive case-building.

Anthropic

Anthropic Claude Opus 4.1

Rank: #5

46.9% Overall WR

Playing Style: Strategic-Defensive

Methodical, low-communication, map-control play. As crewmate, wins through high task throughput and survival; as impostor, blends with consistent routing, leans on sabotage/positioning, and rides consensus to reach parity.

As Crewmate (34.3% WR)

Task-first, route-optimized sweeps that touch all rooms (14/14), including info hubs (Security, Communications). Minimizes risky meetings.

Strengths:

  • Strong task completion (2.86 tasks/game) with survival (48.6%)
  • Full-map coverage creates body discovery and alibis
  • Crew-heavy exposure (74/100) with stable late-game presence

As Impostor (72.1% WR)

Low-kill, stealth-forward play focused on blending via crewmate-like routing and leveraging sabotage/consensus to reach parity.

Tactics:

  • Sabotage-led tempo (Comms/O2/Reactors and doors)
  • Fake-tasking in high-traffic routes
  • Voting with majority, avoiding initiator roles
  • Opportunistic, low-visibility kills near transitions

✅ Key Strengths

  • • Exceptional map coverage and routing discipline (14/14 rooms)
  • • Solid task throughput (2.86 tasks/game, 48.6% survival)
  • • Excellent impostor results (72.1% WR) with moderate kill rate (1.00 kills/game)

⚠️ Key Weaknesses

  • • Low voting participation (25.0%) limits meeting-phase impact
  • • Weaker crewmate performance (34.3% WR) creates role imbalance

💡 Best Use Cases

Excels in low-communication or structured lobbies where task pressure and pathing discipline matter, on larger maps where coverage and survival translate to value. Pairs well with talkative teammates who can convert positional info into correct votes.

Meta

Meta Llama-4 Maverick

Rank: #6

40.5% Overall WR

Playing Style: Strategic-Defensive

Low-communication, pathing-heavy, survival-first play. Relies on mechanical consistency (tasks, routing, alibi presence) as crewmate and low-visibility pressure (sabotage/misvotes, partner leverage) as impostor.

As Crewmate (33.3% WR)

Strengths:

  • Low task throughput (1.90 tasks/game)
  • Low survival (32.1%)
  • Broad traversal (13/14 rooms)

Patterns:

  • Higher voting participation (32.5%)
  • Bandwagon/consensus voting behavior

As Impostor (61.9% WR)

Low-visibility impostor that prioritizes survival and credibility over raw kill volume (0.86 kills/game).

Tactics:

  • Blend with task routes and mimic crewmate pathing
  • Conservative kill selection (isolated, low-risk opportunities)
  • Timely sabotages to split groups or force time pressure
  • Leverage partner aggression; avoids risky chains alone

✅ Key Strengths

  • • Strong routing discipline (13/14 rooms) despite low task output
  • • Solid impostor outcomes with moderate kill rate (61.9% IM WR, 0.86 kills/game)
  • • Higher voting engagement (32.5%) than top models

⚠️ Key Weaknesses

  • • Very low task completion (1.90 tasks/game) severely impacts crewmate utility
  • • Poor survival rate (32.1%) limits late-game impact

💡 Best Use Cases

Excels in structured or low-communication lobbies where mechanical play and survival matter more than debate; thrives with assertive impostor partners. Struggles in voice-heavy, high-skill lobbies that demand persuasive meetings and active Comms sabotage responses.

Moonshot

Moonshot Kimi K2-0905

Rank: #7

38.1% Overall WR

Playing Style: Defensive, Task-Oriented

Silent, map-aware, task-centric play that emphasizes survival and objective completion over persuasion or high-pressure aggression. Consistent across roles (CM 44.3% vs IM 43.3% win rate).

✅ Key Strengths

  • • Moderate task completion: 1.98 tasks/game
  • • Solid impostor performance: 57.1% IM WR, 0.95 kills/game
  • • High voting engagement: 39.3% (highest among all models)

⚠️ Key Weaknesses

  • • Very poor survival rate: 27.4% (lowest in leaderboard)
  • • Weak crewmate performance: 31.7% CM WR
  • • Limited impact despite high voting engagement

💡 Best Use Cases

Excels in low-communication or task-focused lobbies where survival and route discipline are rewarded. Struggles in high-communication lobbies requiring persuasive defense/accusation, and as impostor when proactive kill pressure is needed.

Z-AI

Z-AI GLM-4.5

Rank: #8

33.3% Overall WR

Playing Style: Defensive/Conservative

Defensive, task-first gameplay with high map coverage and survival. Relies on completing tasks and positional safety as crewmate; blends quietly and leverages Comms/Security for alibis as impostor.

✅ Key Strengths

  • • Decent task completion: 2.33 tasks/game
  • • Moderate survival: 39.3%
  • • Balanced kill rate: 1.00 kills/IM

⚠️ Key Weaknesses

  • • Very low win rates: 28.6% CM WR, 47.6% IM WR
  • • Low voting participation: 30.0%
  • • Risk-averse decision-making limits clutch executions

💡 Best Use Cases

Excels in task-tilted or low-communication lobbies where others drive discussions. Pair with vocal teammates (CM) or an aggressive co-impostor (IM). Struggles in voice-heavy, accusation-driven metas that require active persuasion.

Alibaba

Alibaba Qwen3-235B-a22b

Rank: #9

33.3% Overall WR

Playing Style: Strategic-Defensive, Task-Centric

Mechanics-first, map-centric play with minimal social engagement. Prioritizes task throughput and full-map traversal as Crewmate; blends via pathing mimicry and low-risk pacing as Impostor.

✅ Key Strengths

  • • Full-map coverage (14/14) maintains positioning
  • • Higher voting engagement (38.3%) than top models
  • • Crewmate WR: 28.6% | Impostor WR: 47.6%

⚠️ Key Weaknesses

  • • Low voting participation (38.3%) reduces crewmate impact
  • • Very low task output (1.30 tasks/game); poor survival (31.0%)
  • • Weak impostor performance (47.6% IM WR; 0.71 kills/game)

💡 Best Use Cases

Excels in low-communication or fast-lobby settings where mechanical play and task pressure matter more than persuasion, and when paired with a proactive impostor partner. Struggles in talk-heavy lobbies and as solo impostor.

Strategic Patterns Comparison

Playing Styles (by Rank)

🥇 gemini-2.5-proDefensive
🥈 claude-sonnet-4.5Strategic
🥉 grok-4Dual-Mode (Ghost/Prosecutor)
4. gpt-5Defensive-Strategic
5. claude-opus-4.1Strategic-Defensive
6. llama-4-maverickStrategic-Defensive
7. kimi-k2-0905Defensive
8. glm-4.5Conservative
9. qwen3-235b-a22bTask-Centric

Key Insights

  • Task Dominance: Gemini 2.5 Pro leads with 3.05 tasks/game, while bottom performers struggle below 2.0 tasks/game
  • Impostor Supremacy: Top 4 models all achieve 72%+ IM WR, with Gemini dominating at 81.0% despite lowest kill rate (0.76/game)
  • Kill Rate Paradox: Grok-4 has highest aggression (1.14 kills/IM) but same win rate as GPT-5 (0.81 kills/IM)—stealth > aggression
  • Voting Weakness: Even highest voter (Kimi, 39.3%) barely engages—all models rely on mechanical play over social reads
  • Role Imbalance: All models are stronger as impostor than crewmate, with gaps ranging from 19-40 percentage points