Detailed Summary
Introduction and Housekeeping (0:08 - 2:39)
The session begins with housekeeping notes, including the webinar's estimated 45-minute duration, instructions for Q&A, and confirmation that a recording will be sent to registrants. Cornelia Davis, a Developer Advocate at Temporal, introduces herself, highlighting her background in distributed systems and microservices. Josh Smith, a Solution Architect at Temporal, also introduces himself, noting his two-year anniversary at Temporal and his experience in AI and building distributed systems.
Understanding Context Engineering (2:39 - 11:29)
This section defines context engineering as a superset of prompt engineering, focusing on managing all contextual information provided to an LLM, not just user prompts. It emphasizes the importance of structuring and curating this context to help LLMs make reasonable decisions and avoid issues like context poisoning, distraction, and clashes. The discussion highlights that context engineering is akin to a new form of programming, requiring careful consideration of what information to include and exclude from the LLM's context window. An image from Anthropic is used to illustrate the complexity of context engineering compared to simpler prompt engineering.
Common Context Engineering Struggles and Solutions (11:29 - 13:50)
The presentation outlines three key areas of context engineering that will be addressed: scoping agents, providing the right data at the right time, and managing conversation history. Josh Smith takes over to demonstrate the first two points, emphasizing the importance of giving agents focused tasks to ensure reliability.
Demo: Monolithic vs. Micro-Agents (13:50 - 29:35)
Josh demonstrates the pitfalls of monolithic agents by attempting to make a single agent perform multiple tasks. He first shows a working system with a micro-agent architecture for an order management system, where separate agents handle detection, analysis, planning, and reporting, with a human approval step. This system reliably identifies and resolves issues like low inventory or late payments. He then contrasts this by running a "monolith agent" that attempts to do everything. While the demo agent unexpectedly worked well, Josh explains that in previous tests, monolithic agents often became unpredictable, hallucinated problems, gave confusing responses, or were unsure about tool usage. He highlights that monolithic agents are harder to debug, more expensive to retry due to larger context windows, and lack crucial human-in-the-loop steps.
Design Guidance for Agents (29:35 - 35:36)
Josh provides design guidance based on his experience: agents should handle one specific task well (micro-agents), workflows should orchestrate multiple agents and their context, and agents should be given only the necessary tools and information. He strongly recommends separate planning, human-in-the-loop, and execution steps for better visibility and reliability. Creating reports is crucial for understanding agent actions, especially for monolithic agents. Temporal's child workflows or Nexus workflows are suggested for managing long-running sub-agents or tool calls. He concludes by stating that Temporal makes it easy to manage context effectively by returning only necessary information from activities.
Managing Conversation History and Context Pruning (35:36 - 43:00)
Cornelia Davis takes over to discuss managing long-running agent conversations and the need for history pruning. She explains that excessive conversation history can lead to LLM confusion and slow response times. Summarization is introduced as a key pruning technique, where a long history is condensed into a summary to seed future conversations, often using a separate LLM. Other techniques include trimming the oldest parts of the history. Additional concerns like auditability, provenance, and the mechanics of when and how to prune are also highlighted.
Demo: Conversation History Management with Temporal (43:00 - 53:26)
Cornelia demonstrates a chat interface where an LLM acts as both the assistant and a simulated user. She shows how a long conversation history can lead to the LLM (acting as the user) becoming confused and repetitive. To address this, she introduces a summarize command that, when triggered, uses Temporal's continue as new feature. This feature stops the current workflow run and starts a new one with the same workflow ID, but with the conversation history replaced by a summary generated by an LLM. This preserves auditability of the original conversation while providing a concise context for the new workflow run. She also demonstrates a durable timer that automatically triggers summarization after a period of user inactivity, even if the worker is temporarily offline, showcasing Temporal's ability to manage long-running processes and context reliably.
Cornelia and Josh summarize the key takeaways: scoping micro-agents with Temporal activities or child workflows, managing durable and ephemeral context within workflows and activities, and using continue as new and durable timers for conversation history management. They provide resources including a blog post, demo code repositories, and invite attendees to a future webinar on "Human in the Loop" on November 6th. They also address audience questions, discussing how to prevent summarization from losing key details (using guard rails and evals) and the tools used (Temporal, Python, Light LLM). They explain how Temporal's workflow and worker architecture inherently handles concurrency and context isolation, preventing clashes between different agent conversations.