Agent Frameworks & Memory Roundtable

With Cloudflare's Matt Carey

Nov 21, 2025

Software Synthesis analyses the evolution of software companies in the age of AI - from how they're built and scaled, to how they go to market and create enduring value. You can reach me at akash@earlybird.com.

Gradient Descending Roundtables

November 26th: Open Source Models with Alibaba Qwen

This week, we hosted Matt from Cloudflare to discuss Agent Frameworks and Memory. Thanks to everyone who came and made the discussion so insightful!

I’m sharing the summary of our discussion below.

1. Framework Fatigue & The “Library vs Framework” Debate

Core tension: Most participants expressed frustration with bloated frameworks

General consensus: “Everyone’s trying to build frameworks and products, no one’s trying to build libraries”
Many have abandoned complex frameworks in favor of simpler approaches (OpenAI SDK, Anthropic SDK directly)
Matt noted this is his 4th agent framework - each iteration has reduced complexity

Current preferences:

Direct SDK usage (OpenAI/Anthropic) increasingly popular
Cloudflare’s Agents SDK seen as lighter-weight abstraction
LangGraph still used but developers “find myself going back to deterministic workflows”

2. Code Mode

What is Code Mode?

Generating an SDK from tools, then having LLM write code against that SDK instead of direct tool calls
Cloudflare implementation uses dynamic loaders to run generated code in isolated workers
~1ms cold start times on V8 isolates (not full sandboxes)

Key advantages identified:

Massive token efficiency - compress 30 tools into one code-generation tool
Enables data flow without seeing data (like bash pipes)
Deterministic execution with compile-time validation
Can use minified code since it’s machine-executed

Open question Matt posed: Should this be framework-level abstraction or let developers implement themselves?

3. Memory & Context Management Strategies

Minimal RAG adoption: Only 1-2 participants using embeddings-based retrieval

One team: “glob and grab is unreasonably strong baseline” - hard to beat for code agents
When RAG is used: Hybrid approach with knowledge graphs + embeddings

Graph-based approaches:

Neo4j implementation by Cisco team for platform engineering
Using LLMs to build knowledge graphs from documents
Challenges: injection attack concerns, access control complexity
Pattern: Find relevant node via embeddings → K-nearest neighbors traversal

“Predictive context loading” - Most novel approach discussed:

Track agent behaviour patterns across evals
Pre-load context based on statistical patterns (e.g., “if agent touches file X, 90% chance it needs files Y, Z next”)
Comparison to web prefetching/autocorrect
“Old school ML” middle layer between agent and tools

4. Session Management & Sandboxing

Pain points with Claude Code SDK:

“Very insistent on file system” - hard to extract/resume sessions
Teams need to fork agents thousands of times over months
Built custom session managers to work around limitations

Current approaches:

Micro-VMs (e.g., Firecracker via E2B) for code execution
Cloudflare’s durable objects - “distributed little objects with SQLite store”
First iteration: “Durable object as agent was a one-liner”

5. Tracing

Unanimous pain point: Existing tools (LangSmith, LangFuse, etc.) inadequate

Key limitations identified:

Built for simple LLM calls, not complex agent traces
Can’t visualize tree searches, test-time scaling
Fail for sessions spanning months with thousands of forks
“Not the same as distributed tracing for microservices”

Solutions:

Teams building custom tracing UIs
Atla fine-tuned a model specifically for analyzing agent traces against rubrics
Cloudflare just released tracing for Workers

6. Optimisation Strategies & Architectural Patterns

Tool design:

Debate: Many simple tools vs. few complex tools with parameters
Calling LLMs inside tools “becomes painfully slow”
Context reduction: Some teams using separate models to filter tool relevance before main agent

Multi-agent vs. Single-agent evolution:

Pattern: Teams started with “orchestrator + investigators + verifiers”
Newer models collapsing this: “Just pass everything to coding SDK - it’s comparable”
“Handoffs are much faster” than complex tool returns

Prompt engineering shifts:

“A year ago building complicated prompt workflows... now just ‘repo tool and guide’”
Less manual XML construction, more reliance on model intelligence

7. Production Patterns

Pre-determined flows for common patterns:

Recruiting agent example: Binary tree of prompts based on user requirements
Generate “master prompt” from selected sub-prompts
Reduces hallucination for structured interactions

Latency optimisation:

Parallel tool calls causing provider rate limits/timeouts
Hard to control with external APIs
Moving toward edge execution for speed

8. MCP Discussion

When MCP makes sense:

Remote tool servers where contract flexibility matters
Tools that need to adapt per-user/session
“Much better transport layer than API” for dynamic use cases

When NOT to use MCP:

Local execution: “Never suggest MCP server + client on same machine - pointless”
Well-documented TypeScript APIs: “Just use the API directly”
Code mode might reduce MCP need, though Matt sees them as complementary

Critical distinction: MCP = discovery + transport, not execution. Code mode = execution optimisation.

Security Considerations

Injection attacks for graph queries (Neo4j)
Static analysis on generated tool code
Cloudflare’s isolated execution model mitigates many concerns
Access control in multi-tenant knowledge graphs