Hey hey,
Usually, around the third month, almost every AI product team gets into the failure mode. And that is agents forgetting things.
Well, it might not be everything, but inconvenient things definitely are.
It remembers you asked about invoices last turn, but has no idea you told it you hate PDF exports three sessions ago. It might give different answers to the same question depending on where the relevant context sits in the conversation.
Support tickets start arriving. The team blames the model.
Someone suggests a bigger context window. Nothing changes. But the real problem is not the model. It is that the team built the wrong kind of memory for the job.
Many, when they think about agent memory, picture the conversation history, the list of messages passed into the LLM on every turn. That is part of it.
But it is maybe 10% of what agent memory actually means. There are four different memory types that every production AI agent relies on. Each one stores different information, works at a different speed, and breaks in completely different ways.
Blending them is why agents feel dumb at the worst moments.
A PM who understands these four can have a real conversation with their engineering team about why the agent is failing and what to do about it.
The Four Types
Researchers from a 2024 paper by Sumers, Yao, Narasimhan, and Griffiths call it CoALA (Cognitive Architectures for Language Agents).
LangChain uses the same branch in its production memory systems. It maps AI agent memory to how human cognition works, which is what makes it stick.

The CoALA memory taxonomy, used by LangChain as the foundation for production agent memory systems. (source)
Type 1: In-Context Memory (Working Memory)
This is the conversation history.
It includes everything currently loaded into the model's active context window, such as the messages, tool results, system prompt, and retrieved documents.
The LLM only ever "sees" what is in this window.
Think of it as RAM. It's fast, immediate, powerful, and capped.
Modern frontier models have large context windows (Claude 3.5 Sonnet supports 200,000 tokens), but context windows are not infinite.
These are expensive, and more importantly, they reset every session.
When you start a new conversation, in-context memory empties fully. This is the type everyone focuses on, because it is the most visible.
You can literally read it. Most early agent prototypes try to solve all memory problems by pushing more into the context.
That works until it does not. As conversations grow longer, the model's ability to attend to information from the very start degrades.
Not a hard cutoff, just a gradual loss of thread. And across sessions, in-context memory offers nothing. Your agent has no idea who it is talking to on day two.
Type 2: Episodic Memory (Past Experiences)
This is where the agent stores records of things that have happened.
These are specific interactions, past action sequences, and how a particular type of task was handled before. It is typically a few-shot example retrieval.
When the agent faces a new task, it checks past experiences for relevant examples, pulls the most similar ones, and injects them into the context as demonstrations.
The agent effectively shows itself how it solved a similar problem before.
LangChain's research agent Unify follows this pattern. It stores past research approaches and retrieves them when a new query looks similar.
The result is an agent that improves at its job over time because it keeps getting better examples of itself working correctly. The failure mode is retrieval quality.
Episodic memory only helps if the right past experience is retrieved at the right time. If you retrieve a vaguely similar one, you feed the agent a bad example.
Many teams add episodic memory late in the build cycle and find their experience store is full of failed attempts and edge cases. That's the wrong examples to learn from.
Type 3: Semantic Memory (What the Agent Knows)
This is the agent's ongoing knowledge store.
It carries information about users, entities, policies, and the world that should be used across sessions and needed to help future responses.
The implementation usually involves a database, often a vector store, from which the agent gets and reads information. When a user mentions they prefer concise summaries, the agent pulls that preference and writes it to semantic memory.
In the next session/chat, it retrieves that information before responding.
The user sees it as the agent "remembering" who they are. This is what people mean when they say they want their agent to "remember things."
Replit's coding agent, for example, may store the Python libraries a user works with most often, and it might not remember any other conversation.
Here, the failure mode is staleness and contradiction. Semantic memory grows over time and will have outdated or contradictory facts.
LangGraph has two approaches:
The profile model (a single continuously-updated document per user)
The collection model (many narrower documents updated independently).
The profile model becomes unmanageable as it grows. The collection model shifts sophistication to searching and deduplication.
Either way, unmanaged semantic memory works against you. The agent retrieves a preference from eighteen months ago that is no longer true and acts on it with full confidence.
Type 4: Procedural Memory (How the Agent Behaves)
This is the strangest one, because it does not feel like a memory in the conventional sense. Procedural memory is what makes the agent behave the way it does.
It involves the system prompt, agent code, and model weights.
It is implicit knowledge about how to act.
Think about how a human learns to ride a bike. They do not consciously recall a set of instructions on every turn. The knowledge is embedded.
Procedural memory works the same way: encoded in the instructions and, at the deepest level, in the fine-tuned weights of the model itself.
The most accessible version for product teams is the system prompt.
When you tell the agent "always cite sources" or "respond formally to enterprise accounts," you are writing procedural memory. When your team fine-tunes a model on domain data, you are updating it at a deeper level.
Changing procedural memory is slow and deliberate. You cannot hot-update a fine-tuned model in response to a single interaction.
Because it is expensive to change, it is also the most stable. Behavioural rules encoded here will impact every session, user, and context.
The failure mode is drift. Prompts grow organically, get edited by multiple team members, and eventually lose internal consistency.
The agent receives contradictory instructions, and nobody audited the system prompt as a document. Treat the system prompt with the same rigour as source code: versioned, reviewed, and tested.

Hot path memory (left) updates during the response cycle.
Background memory (right) updates asynchronously. (source)
Where Teams Go Wrong
The default failure is using in-context memory for everything, because it is easy to reach for. A user says, "I prefer bullet points."
The instinct is: put that in the system prompt, or hope the conversation history carries it forward. Neither is right. A user preference is semantic memory.
It should be extracted, stored, and retrieved across sessions.
Baking individual preferences into a shared system prompt pollutes it.
Conversation history forgets when a new session starts. The question for any agent feature is: Which type of memory is this information?
Needed for this turn only? In-context memory.
A record of something that happened? Episodic.
A fact about a user or entity that should persist? Semantic.
A rule about how the agent should behave everywhere? Procedural.
Each answer points to a different storage mechanism, a different retrieval strategy, and a different failure mode to design against.
The Two Clocks
There is one more difference that changes how teams build. LangChain calls it hot path versus background memory updating.
Hot path means the agent updates memory synchronously during the response cycle. Before replying, it extracts and stores a new fact.

This guarantees freshness but adds latency to every turn. Background means memory is updated asynchronously after the response is sent.
A separate process writes to the store later, sometimes 30 minutes later. Faster for the user, but a brief lag before new information is available.
For consumer-facing agents, background memory is usually the right default. For high-stakes enterprise agents where the latest information matters immediately, hot path is worth the cost.
What This Looks Like in Practice
Take a customer support agent for a SaaS product.
In-context memory will store the current conversation. The agent knows you said "I'm getting an error on export" two messages ago.
Semantic memory stores the fact that this user is on the Enterprise plan, has raised three tickets in the last quarter, and prefers email follow-ups.
Retrieved before the session even began. Episodic memory holds past examples of how similar export errors were successfully resolved. These are injected as demonstrations so the agent approaches this one the same way.
Procedural memory is the system prompt: escalate billing disputes above £500, never speculate about unreleased features, always end with a resolution check.

All four are working simultaneously. The agent does not choose which memory to use. It uses all of them, layered together inside that single context window. Your job as the PM is to ensure each layer is populated and maintained well enough that they stack cleanly.
When teams say "the agent is forgetting things," one of those layers has a gap. The memory did not get written. The retrieval did not surface it.
The system prompt contradicts itself.
Identifying which layer broke is the first step to fixing it, and that is a much more useful conversation than "can we increase the context window."
That's it for today. See you in the next one.
— Sid
