Detailed Summary
The video introduces new AI releases: Gemini 3 Pro, Google Anti-Gravity IDE, and Nano Banana Pro, alongside OpenAI's GPT 5.1 Codex Max. It highlights Gemini 3 Pro as the new top model based on benchmarks, surpassing Claude Code (Sonnet 4.5). Google Anti-Gravity is dismissed as low signal, being a VS Code fork similar to Cursor, while Nano Banana Pro is praised for its high capability in image generation. The core new capability for engineers is emphasized: giving AI agents their own dedicated computers.
Gemini 3 Pro Agent Sandboxes (2:00 - 5:50)
The presenter demonstrates running Gemini 3 Pro via its CLI in agent sandboxes. Initial examples include generating and hosting an SVG of a pelican on a skateboard and three banana-themed Pokémon cards. The focus then shifts to more complex, full-stack applications, showing Gemini 3 Pro building an SQLite CRUD interface, a Nano Banana Pro UI, and a note-taking application, each in its own sandbox. This process scales compute by launching multiple Gemini agents in parallel, each operating its own computer.
Do Model Releases Matter Anymore? (5:50 - 9:49)
With 15 agent sandboxes running, the discussion turns to the relevance of model performance. E2B is identified as the service hosting these sandboxes, enabling greater agent autonomy and the 'best of N' pattern. While third-party benchmarks confirm Gemini 3 Pro as the top model, the video argues that raw model intelligence is becoming less critical. Instead, the 'agentic experience'—the agent, its tooling, and workflows—and performance against specific use cases are paramount. The limitation is no longer the language model but the agentic systems engineers build.
Wow! Thank you - 100k Subs! (9:49 - 11:52)
The presenter expresses gratitude for reaching 100,000 subscribers, acknowledging the channel's niche focus on deep tech, engineering, and agentic engineering. The channel's goal is to build living software that works autonomously. Upcoming content includes year-end predictions for 2026 and final lessons for Agentic Horizon.
Reprogramming Agents for Agent Skills (11:52 - 17:22)
The video explains how agents are reprogrammed to use 'agent skills' through custom backslash commands. This involves mapping special syntax to shared, reusable prompts stored in memory files, allowing agents to understand and execute complex workflows like 'plan, build, host, test' for full-stack applications. The agent sandbox skill, a 170-line prompt, guides agents through steps like reading documentation, initializing the sandbox, creating a plan, building, hosting, and testing.
Full Stack Agent Results (17:22 - 26:45)
The results of the full-stack applications built by Sonnet, Gemini 3 Pro, and Codex are reviewed. Claude Sonnet 4.5 successfully delivers a fully functional SQLite CRUD interface with persistence. Gemini 3 Pro completes a note-taking application with persistence, despite some UI issues. Codex 5.1 Max produces a Nano Banana image generation UI with impressive styling, though it struggles with image generation functionality. While Gemini 3 Pro and Codex show strong capabilities, Claude Sonnet 4.5 consistently delivers the most reliable working versions, highlighting the importance of the complete agentic system over raw model power. The 'best of N' pattern is reinforced as a strategy to mitigate individual agent failures.
Agent Sandbox Skill Breakdown (26:45 - 29:50)
The agent sandbox skill is detailed, showing how it guides agents through the entire development process within isolated environments. These sandboxes provide isolation for security, scalability for deploying many agents, and increased autonomy for agents to achieve tasks. The video reiterates that the model is no longer the primary limitation; instead, it's the engineer's ability to deploy effective agentic systems and compute at scale. The 'best of N' pattern is emphasized as a key technique for maximizing AI compute and scaling engineering impact.