This was Opus week. On May 28 Anthropic shipped Claude Opus 4.8, and the number that matters is not the agentic-coding benchmark (up to 69.2%) - it is the honesty one: the new model is roughly four times less likely than 4.7 to let a flaw in its own code pass unremarked. That is the gap between an agent you supervise line by line and one you can leave running.
Anthropic held pricing flat at $5 / $25 per million tokens, added a fast mode that is 2.5x quicker and 3x cheaper than before, and - the same day - closed a $65B raise. The flagship got better and cheaper to run in the same breath. Which made it the perfect week to ask whether the flagship is still the only answer: we ran Opus 4.8 head to head against typed on fourteen medium-to-hard coding tasks, graded against real test suites. They tied, 14-14. The quality gap has narrowed to a billing decision.
Below: the Opus 4.8 details and its same-day Copilot GA, a busy Claude Code release train, and a broader week where OpenAI put Codex hands-on-keyboard inside Windows, MiniMax teased a 15.6x decoding speedup, and Mistral pivoted to a full-stack enterprise play.
From the Yaw blog
- typed vs Opus 4.8: a coding head-to-head - we graded Opus 4.8 and typed on fourteen medium-to-hard Claude Code tasks against real test suites; they tied 14-14, leaving price and billing as the only real differentiators
- mcp.hosting is now Yaw MCP - the rebrand folds mcp.hosting into the Yaw product family; lazy-loaded tool dispatch keeps ~10 meta-tools in context instead of hundreds, with encrypted backend credential storage and A-F server compliance grades
Anthropic this week
- Introducing Claude Opus 4.8 - the new flagship is around 4x less likely to let its own code flaws pass, pushes agentic coding to 69.2%, holds pricing flat at $5 / $25 per million tokens, and ships a fast mode that is 2.5x faster and 3x cheaper; Anthropic closed a $65B raise the same day
- Opus 4.8 is GA in GitHub Copilot - same-day general availability, landing the new flagship in Copilot's chat and agent surfaces the day it shipped
Claude Code this week
- v2.1.150 through v2.1.157 (May 25-29) - Auto mode reaches Bedrock, Vertex, and Foundry for Opus 4.7 and 4.8 (opt in with
CLAUDE_CODE_ENABLE_AUTO_MODE=1); plugins in.claude/skillsauto-load without a marketplace andclaude plugin initscaffolds new ones; plus background-session stabilization and security fixes for MCP and auth, including a custom-gateway OAuth credential leak
The broader week
- OpenAI gives Codex Computer Use on Windows - Codex can now see, click, and type inside Windows apps to test and refine its own builds, with remote continuation from mobile or Mac and new Codex Profiles for usage and token tracking
- MiniMax teases M3 with a new sparse-attention design - MiniMax Sparse Attention claims up to 15.6x faster decoding and 9.7x faster prefill at a 1M-token context; weights, benchmarks, and a date are still under wraps, with the full model due in the second half of 2026
- Mistral shifts to a full-stack strategy - Le Chat becomes "Vibe," a unified agent platform, alongside a new Industrial AI stack for aerospace, automotive, and semiconductors (Airbus, BMW, and ASML named); the pitch leans on European data sovereignty and on-prem deployment