Loading summary...

Related Videos

BIG 3 SUPER AGENT: Gemini 2.5 Computer Use, OpenAI Realtime API, Claude Code

5 min read (84% time saved)

Too Long; Didn't Watch — Summary

This video demonstrates a "Big 3 Super Agent" system that orchestrates Claude Code agents, Gemini 2.5 Computer Use, and OpenAI's Realtime API, controlled by voice commands, to build a Sora API video generation platform. It emphasizes that combining multiple AI tools and models is the winning strategy for engineers to scale their compute and impact, rather than being loyal to a single provider.

Summarize another video

Press ⌘K to quickly paste a new URL

Related Videos

Detailed Summary

Introduction to the Super Agent (0:04 - 1:50)

The video begins with a demonstration of a voice-controlled orchestrator, ADA, creating and commanding two Claude Code agents, Sony and Blink. Sony is tasked with planning video generation support for the Sora API, while Blink is instructed to update the frontend UI. The presenter introduces the core philosophy: instead of picking one AI tool, engineers should combine them to maximize capability, advocating for thinking in "ands" not "ors." This integration of Gemini 2.5 Computer Use, OpenAI Realtime API, and Claude Code forms the "Big 3 Super Agent."

Agent Status and Observability (1:52 - 5:00)

The presenter checks the status of Sony and Blink through ADA. Blink is implementing UI changes, and Sony is planning the Sora API integration. The system features a voice orchestrator agent and multi-agent observability, providing a live pulse of agent activities. This observability, leveraging Claude Code hooks, allows detailed monitoring. A key innovation is agents validating their own work using the Gemini 2.5 Computer Use model within a browser agent, creating powerful closed-loop structures. This self-validation and real-time monitoring are crucial for scaling compute and impact.

Agentic Planning and Custom Commands (5:00 - 7:30)

ADA confirms Sony has completed its planning task, generating a detailed specification document for Sora video generation. Blink has completed its UI code changes and is preparing for validation. The presenter opens the generated specs file, showcasing the architecture, API endpoints, and project structure. The Claude Code agents run on Claude Code 2 with the Claude 4.5 Sonnet model. The use of custom slash commands, like /plan, is highlighted as a powerful feature for reusable compute and structured results, allowing agents to generate plans in a predefined format.

Agent Implementation and Initial Challenges (7:30 - 9:36)

The presenter instructs Sony to build the backend based on its generated specification file using a /build custom command. Blink is commanded to read the same specification file. Both agents kick off, and their progress is observed through the multi-agent observability system. The importance of scaling compute through better, more, and custom agents, and then orchestrating them effectively, is reiterated. Blink finishes reading the spec and is then commanded to implement the frontend, focusing on fault tolerance and retry logic, while Sony continues with the backend. A brief issue arises where Sony's backend work interferes with Blink's frontend, highlighting the need for clear agent boundaries.

System Architecture Breakdown (9:36 - 13:45)

The video details the application architecture, starting with the input layer (engineer and agents), which uses text or voice via the orchestrator agent running OpenAI's Realtime API. The system layer involves the Realtime API's tools (e.g., list agents, create agent, command agent) interacting with builder agents (Claude Code agents or browser agents). Claude Code agents have access to Gemini 2.5 Computer Use for closed-loop validation. The orchestration layer is designed to be thin and adaptable to any model or tool. The output layer includes audio, text, files, and side effects, which loop back as inputs, emphasizing the importance of feedback loops for powerful and legitimate systems. The UI updates show Blink successfully restoring the desired frontend style after the previous interference.

Agent Testing and Remixing (13:45 - 19:57)

The agents continue their work, with Blink updating the UI and Sony testing the backend. A new browser use agent is created and instructed to test the local host, generate an image prompt, and update to Sora 2 Pro for an 8-second portrait video. The manual test shows a cat image generated, confirming the system's functionality. The browser agent autonomously generates a video, demonstrating the end-to-end process. An attempt to remix a video leads to a crash of the voice orchestration agent, but the system's ability to quickly restart and resume work from logged files is highlighted.

Conclusion and Tactical Agentic Coding (19:57 - 32:12)

The presenter summarizes the multi-agent system's capabilities, showcasing Gemini 2.5 for testing, multi-agent observability, and powerful Claude Code agents building the application. He reiterates that agents can continuously deploy, test, and validate their work. The use of specific commands like /plan and /build allows for concise natural language direction. The video concludes by promoting "Tactical Agentic Coding," a course for experienced engineers to master advanced agentic engineering, emphasizing that the engineer is the bottleneck, not the tools. The course aims to transform engineers into irreplaceable assets by teaching them to build systems that build systems, focusing on scaling compute and impact through multi-agent architectures. The early bird deal for the course is highlighted, along with a 30-day full refund guarantee.