Agent Frameworks & Memory Roundtable
With Cloudflare's Matt Carey
Software Synthesis analyses the evolution of software companies in the age of AI - from how they're built and scaled, to how they go to market and create enduring value. You can reach me at akash@earlybird.com.
Gradient Descending Roundtables
November 26th: Open Source Models with Alibaba Qwen
This week, we hosted Matt from Cloudflare to discuss Agent Frameworks and Memory. Thanks to everyone who came and made the discussion so insightful!
I’m sharing the summary of our discussion below.
1. Framework Fatigue & The “Library vs Framework” Debate
Core tension: Most participants expressed frustration with bloated frameworks
General consensus: “Everyone’s trying to build frameworks and products, no one’s trying to build libraries”
Many have abandoned complex frameworks in favor of simpler approaches (OpenAI SDK, Anthropic SDK directly)
Matt noted this is his 4th agent framework - each iteration has reduced complexity
Current preferences:
Direct SDK usage (OpenAI/Anthropic) increasingly popular
Cloudflare’s Agents SDK seen as lighter-weight abstraction
LangGraph still used but developers “find myself going back to deterministic workflows”
2. Code Mode
What is Code Mode?
Generating an SDK from tools, then having LLM write code against that SDK instead of direct tool calls
Cloudflare implementation uses dynamic loaders to run generated code in isolated workers
~1ms cold start times on V8 isolates (not full sandboxes)
Key advantages identified:
Massive token efficiency - compress 30 tools into one code-generation tool
Enables data flow without seeing data (like bash pipes)
Deterministic execution with compile-time validation
Can use minified code since it’s machine-executed
Open question Matt posed: Should this be framework-level abstraction or let developers implement themselves?
3. Memory & Context Management Strategies
Minimal RAG adoption: Only 1-2 participants using embeddings-based retrieval
One team: “glob and grab is unreasonably strong baseline” - hard to beat for code agents
When RAG is used: Hybrid approach with knowledge graphs + embeddings
Graph-based approaches:
Neo4j implementation by Cisco team for platform engineering
Using LLMs to build knowledge graphs from documents
Challenges: injection attack concerns, access control complexity
Pattern: Find relevant node via embeddings → K-nearest neighbors traversal
“Predictive context loading” - Most novel approach discussed:
Track agent behaviour patterns across evals
Pre-load context based on statistical patterns (e.g., “if agent touches file X, 90% chance it needs files Y, Z next”)
Comparison to web prefetching/autocorrect
“Old school ML” middle layer between agent and tools
4. Session Management & Sandboxing
Pain points with Claude Code SDK:
“Very insistent on file system” - hard to extract/resume sessions
Teams need to fork agents thousands of times over months
Built custom session managers to work around limitations
Current approaches:
Micro-VMs (e.g., Firecracker via E2B) for code execution
Cloudflare’s durable objects - “distributed little objects with SQLite store”
First iteration: “Durable object as agent was a one-liner”
5. Tracing
Unanimous pain point: Existing tools (LangSmith, LangFuse, etc.) inadequate
Key limitations identified:
Built for simple LLM calls, not complex agent traces
Can’t visualize tree searches, test-time scaling
Fail for sessions spanning months with thousands of forks
“Not the same as distributed tracing for microservices”
Solutions:
Teams building custom tracing UIs
Atla fine-tuned a model specifically for analyzing agent traces against rubrics
Cloudflare just released tracing for Workers
6. Optimisation Strategies & Architectural Patterns
Tool design:
Debate: Many simple tools vs. few complex tools with parameters
Calling LLMs inside tools “becomes painfully slow”
Context reduction: Some teams using separate models to filter tool relevance before main agent
Multi-agent vs. Single-agent evolution:
Pattern: Teams started with “orchestrator + investigators + verifiers”
Newer models collapsing this: “Just pass everything to coding SDK - it’s comparable”
“Handoffs are much faster” than complex tool returns
Prompt engineering shifts:
“A year ago building complicated prompt workflows... now just ‘repo tool and guide’”
Less manual XML construction, more reliance on model intelligence
7. Production Patterns
Pre-determined flows for common patterns:
Recruiting agent example: Binary tree of prompts based on user requirements
Generate “master prompt” from selected sub-prompts
Reduces hallucination for structured interactions
Latency optimisation:
Parallel tool calls causing provider rate limits/timeouts
Hard to control with external APIs
Moving toward edge execution for speed
8. MCP Discussion
When MCP makes sense:
Remote tool servers where contract flexibility matters
Tools that need to adapt per-user/session
“Much better transport layer than API” for dynamic use cases
When NOT to use MCP:
Local execution: “Never suggest MCP server + client on same machine - pointless”
Well-documented TypeScript APIs: “Just use the API directly”
Code mode might reduce MCP need, though Matt sees them as complementary
Critical distinction: MCP = discovery + transport, not execution. Code mode = execution optimisation.
Security Considerations
Injection attacks for graph queries (Neo4j)
Static analysis on generated tool code
Cloudflare’s isolated execution model mitigates many concerns
Access control in multi-tenant knowledge graphs
Further Reading
Have any feedback? Email me at akash@earlybird.com.



