AI Intelligence Report
Top Stories
Anthropic’s AI Beats Its Own Researchers at Alignment Research
In a study published April 14, Anthropic ran nine copies of Claude Opus 4.6 on an AI safety benchmark known as a “weak-to-strong supervision” problem. The AI agents hit a 97% success rate in five days of autonomous research โ compared with a 23% baseline for human researchers doing the same task. The work cost about $18,000 in compute across 800 cumulative research hours. It’s one of the clearest demonstrations yet that AI can do meaningful scientific work faster and cheaper than people, at least on narrowly defined problems. The broader implication: AI labs may increasingly use their own models to accelerate safety research.
OpenAI Launches Next-Generation Agents SDK with Sandbox Execution
OpenAI released a major update to its Agents SDK yesterday, adding native sandbox execution and a model-native harness. In plain terms, this means developers can now build AI agents that run securely in isolated environments, handling files and tools over long periods without risking the host system. This follows Anthropic’s similar “Managed Agents” launch on April 8. The agent-building race is heating up โ both companies are making it dramatically easier to create AI that can do sustained, real-world tasks, not just answer single questions.
OpenAI Touts Amazon Alliance, Says Microsoft “Limited Our Ability” to Reach Clients
An internal OpenAI memo obtained by CNBC reveals growing friction between OpenAI and Microsoft. OpenAI is promoting its new partnership with Amazon as a way to reach enterprise customers that Microsoft’s exclusivity arrangements had blocked. The memo explicitly states that Microsoft has “limited our ability” to serve certain clients. Meanwhile, OpenAI has quietly begun scaling GPT-5.4-Cyber, a security-focused model, to its highest-tier API customers โ signaling a push into the lucrative cybersecurity market.
Google Developing Desktop Agent to Rival Claude Cowork
Google is testing a new “Agent” tab in Gemini for Business that looks remarkably like Anthropic’s Cowork. The panel includes fields for Goal, Agents, Connected apps, Files, and a “Require human review” toggle. This suggests Google is building a system where Gemini can take a goal, connect to your apps, and carry out multi-step workflows on your behalf. The feature is expected to be formally announced at Google I/O.
The Classified Frontier: When AI Model Weights Travel in Armored Briefcases
Azeem Azhar’s latest newsletter opens with a striking image: an OpenAI representative arriving at Los Alamos National Laboratory with locked metal briefcases, accompanied by armed security, carrying the model weights for ChatGPT o3 to an air-gapped classified supercomputer. The piece argues that frontier AI creates a paradox โ building it requires enormous concentrated resources, but once built, capabilities spread easily through APIs and distillation. Azhar contends containment is no longer feasible and the US must shift from preventing AI access to strategically controlling it.
AI News Roundup
Inoreader AI Folder
17 articles published in the last 24 hours (some duplicates across feeds consolidated below)
The Classified Frontier
OpenAI physically transported model weights to Los Alamos in armored briefcases. Azhar argues frontier AI’s paradox โ concentrated to build, easy to spread โ means containment has failed and the US must shift to strategic access control.
The Next Evolution of the Agents SDK
OpenAI updated its Agents SDK with native sandbox execution and a model-native harness, enabling developers to build secure, long-running agents that operate across files and tools in isolated environments.
Anthropic’s AI Beat Anthropic’s Own Researchers
Covers the Anthropic alignment study where nine Claude Opus 4.6 agents achieved 97% success vs. 23% human baseline, plus the Altman attacker arraignment and GPT-5.4-Cyber launch. Also notes Maine’s new data center ban.
Maine Bans New Data Centers Until November 2027
Maine passed a state-wide ban on new data centers over 20 megawatts until November 2027. Despite limited current data center presence, developers had been actively pursuing sites in the state. Local communities pushed back over energy and environmental concerns.
Cursor’s Recent Pivot Is Just Like Codex
Analysis of how Cursor, the popular AI coding editor, is pivoting its approach in a way that mirrors OpenAI’s Codex strategy โ focusing on background autonomous agents rather than inline autocomplete. Signals a broader industry shift in how AI coding tools will work.
OpenAI Dropped a New Cyber Model for Security Pros
Coverage of GPT-5.4-Cyber’s rollout to top-tier customers. Also notes that OpenAI’s own investors are reportedly starting to wonder if Anthropic might be the better bet โ a notable signal of shifting sentiment in the investor community.
Google’s Desktop Agent
Covers Google’s development of a desktop agent feature for Gemini that closely resembles Anthropic’s Cowork. Also mentions early reports about Claude Opus 4.7 preparation.
Zuck’s AI Twin
Meta’s Mark Zuckerberg reportedly used an AI version of himself internally. The newsletter also covers OpenAI’s competitive strategy against Anthropic.
The AI Gap Nobody’s Talking About: Spatial Intelligence
Explores “spatial intelligence” โ AI’s ability to understand and navigate 3D physical space โ as an underappreciated frontier. Current language models are brilliant with text but poor at understanding physical environments, which limits robotics and AR applications.
How They Use AI to Qualify Leads + Claude Routines
Practical guide on using AI for lead qualification, plus a look at “Claude Routines” โ a way to set up recurring automated tasks using Claude โ and tips for automating tasks on a VPS (virtual private server).
Someone’s Agent Is Watching Them Sleep
A provocative look at the privacy implications of always-on AI agents, including reports of agents continuing to monitor and process data even when users are asleep or inactive.
The Prompt That Turns Skills Into a Sellable Offer
A prompt-engineering tutorial that walks through turning vague expertise into a clear offer with pricing and launch plan โ aimed at people using AI to build consulting or freelance businesses.
AI Workflows & Tool Watch
Anthropic Launches Claude Managed Agents (Public Beta)
Anthropic released a public beta of “Managed Agents” on the Claude Platform โ essentially a way for developers to spin up an AI agent that runs in a secure sandbox, persists state, and can work on tasks for extended periods. It bundles the agent loop, tool execution, sandbox, and state persistence into a few simple API calls. If you’re using Claude Code or Cowork, this is the underlying infrastructure that powers long-running autonomous tasks.
Claude Cowork Now Generally Available on macOS and Windows
Cowork โ the Claude desktop feature you’re likely reading this briefing through โ is now GA with expanded analytics, OpenTelemetry support for monitoring, and role-based access controls for Enterprise plans. This means more stable performance and better team administration options if you’re rolling it out across the comms team.
Perplexity Computer Launches for Enterprise
Perplexity’s “Computer” โ an AI agent that can operate software and execute complex, multi-step workflows using 19 different AI models โ is now available for enterprise customers. It can run for hours or months on sustained tasks. New features include voice mode (describe tasks verbally, give mid-task feedback) and a tax preparation agent. The enterprise push puts Perplexity in direct competition with Microsoft and Salesforce.
Claude Code Updates: Worktree Switching & MCP Fixes
Recent Claude Code updates include worktree switching, a new PreCompact hook, background plugin monitors, and important reliability improvements. Notably, subagents now properly inherit MCP tools from dynamically-injected servers โ which means your MCP-connected workflows should be more reliable.
n8n’s AI Workflow Builder Now in Beta
n8n’s new AI Workflow Builder lets you describe an automation in plain English and have n8n generate a starting workflow structure. Combined with their Human-in-the-Loop feature (shipped January), this makes it much easier to build AI-powered automations without deep technical knowledge. Also worth noting: an Obsidian community member shared a workflow using n8n to automatically sort fleeting notes using a local AI agent overnight.
Reddit’s Favorite Multi-AI Workflow Pattern
The most-upvoted AI workflow pattern on Reddit right now: “ChatGPT for brainstorming โ Claude for writing โ Perplexity for fact-checking โ Grammarly for polish.” Power users are also recommending a morning routine of Perplexity for industry news, Claude for analysis, then Claude for execution throughout the day. This maps well to a comms workflow.
Google Chrome “Skills” for One-Click AI Workflows
Google Chrome is introducing “Skills” โ saved, one-click workflows for frequently used AI prompts. Think of it like browser bookmarks, but for AI actions. If you have repetitive research or content review tasks, this could streamline your daily routine significantly.
Tencent AI Mentions
Hunyuan 3.0 Launch Expected This Month
Tencent’s Hunyuan 3.0 language model โ led by 28-year-old chief AI scientist Yao Shunyu (formerly of OpenAI) โ is expected to launch in April alongside DeepSeek V4. The model has approximately 30 billion parameters and represents a significant step up from the previous generation.
Tencent Bets Big on AI Agents Across Ecosystem
Caixin’s in-depth report explores how Tencent sees its competitive edge in AI agents โ specifically its ability to embed agents into WeChat, WeCom, and QQ where hundreds of millions of users already live. The company launched WorkBuddy for enterprise and is integrating AI agent capabilities via QClaw into QQ communities.
Tencent Cloud Raises AI Compute Prices ~5%
Tencent Cloud announced price adjustments for AI compute, container services, and EMR products, with prices rising about 5% effective May 9. Alibaba, Baidu, and Zhipu have made similar moves โ signaling that the era of deeply subsidized AI compute in China may be ending.
Tencent Testing AI to Run QQ Online Communities
Tencent is experimenting with AI agents that manage and moderate online communities within QQ โ handling content moderation, engagement prompts, and community organization autonomously.
US AI Labs Unite Against Chinese Model Copying
OpenAI, Anthropic, and Google are sharing intelligence through the Frontier Model Forum to prevent Chinese AI companies from stealing their models via adversarial distillation. Worth monitoring for how this affects the broader US-China AI dynamic and Tencent’s positioning.