Memory Systems and State Management

Persistent Context, Knowledge Storage, and State Persistence

Intelligent Memory Management

Memory systems enable agents to maintain context across interactions, learn from experience, and access relevant information efficiently for better decision-making and continuity.

Short-Term Memory
Temporary storage for current conversation context, recent interactions, and working information that needs to be readily accessible during active sessions.
  • Conversation buffer management
  • Session-based context retention
  • Fast access and retrieval
  • Automatic cleanup and expiration
  • Token limit management
from langchain.memory import ConversationBufferMemory # Basic conversation buffer memory = ConversationBufferMemory( memory_key="chat_history", return_messages=True, max_token_limit=2000 ) # Window-based memory window_memory = ConversationBufferWindowMemory( k=5, # Keep last 5 exchanges return_messages=True ) # Summary memory for long conversations summary_memory = ConversationSummaryMemory( llm=llm, max_token_limit=1000 )
Long-Term Memory
Persistent storage for learned knowledge, user preferences, historical patterns, and accumulated experience that persists across sessions and interactions.
  • Vector embeddings storage
  • Semantic search capabilities
  • Knowledge graph integration
  • User preference learning
  • Cross-session persistence
from langchain.vectorstores import Chroma from langchain.embeddings import OpenAIEmbeddings # Vector store for long-term memory embeddings = OpenAIEmbeddings() vectorstore = Chroma( collection_name="agent_memory", embedding_function=embeddings, persist_directory="./memory_db" ) # Store and retrieve memories def store_memory(content, metadata=None): vectorstore.add_texts( texts=[content], metadatas=[metadata or {}] ) def retrieve_memories(query, k=5): return vectorstore.similarity_search( query, k=k ) # Knowledge graph memory class KnowledgeGraphMemory: def __init__(self): self.graph = nx.DiGraph() def add_fact(self, subject, predicate, object): self.graph.add_edge(subject, object, relation=predicate)

Implementation Strategies

Buffer-Based Memory
Simple FIFO or sliding window approaches for managing conversation history and recent context.
  • Conversation buffer memory
  • Token-aware truncation
  • Sliding window context
  • Summary-based compression
Vector-Based Memory
Semantic similarity search using embeddings for intelligent information retrieval and context matching.
  • Semantic similarity search
  • Embedding-based retrieval
  • Contextual relevance scoring
  • Multi-modal embeddings
Graph-Based Memory
Structured knowledge representation using graphs for complex relationship modeling and reasoning.
  • Entity relationship modeling
  • Knowledge graph traversal
  • Reasoning over connections
  • Temporal relationship tracking

Complete Memory System Example

class HybridMemorySystem:
    def __init__(self):
        # Short-term: Conversation buffer
        self.conversation_memory = ConversationBufferWindowMemory(k=10)
        
        # Long-term: Vector store
        self.vector_memory = Chroma(embedding_function=OpenAIEmbeddings())
        
        # Structured: Knowledge graph
        self.knowledge_graph = KnowledgeGraphMemory()
        
        # User preferences
        self.user_preferences = UserPreferenceStore()
    
    def store_interaction(self, user_input, agent_response, context=None):
        # Store in conversation buffer
        self.conversation_memory.save_context(
            {"input": user_input}, 
            {"output": agent_response}
        )
        
        # Extract and store important information
        if self.is_important(user_input, agent_response):
            self.vector_memory.add_texts([f"{user_input} -> {agent_response}"])
            
        # Update knowledge graph
        entities = self.extract_entities(user_input, agent_response)
        self.knowledge_graph.update_from_entities(entities)
    
    def retrieve_relevant_context(self, query):
        # Get recent conversation
        recent_context = self.conversation_memory.load_memory_variables({})
        
        # Search long-term memory
        relevant_memories = self.vector_memory.similarity_search(query, k=3)
        
        # Query knowledge graph
        related_facts = self.knowledge_graph.get_related_facts(query)
        
        return {
            "recent": recent_context,
            "memories": relevant_memories,
            "facts": related_facts
        }

Performance Optimization Techniques

Memory Compression
Reduce memory footprint while preserving essential information through intelligent compression.
Techniques: Summarization, key information extraction, lossy compression for old data
Hierarchical Storage
Implement tiered storage with different access patterns and retention policies.
Techniques: Hot/warm/cold storage tiers, automatic data migration, access pattern analysis
Caching Strategies
Implement intelligent caching for frequently accessed memories and computed results.
Techniques: LRU caching, semantic caching, precomputed embeddings, query result caching
Memory Cleanup
Automatic cleanup of outdated, irrelevant, or redundant information to maintain efficiency.
Techniques: TTL-based expiration, relevance scoring, duplicate detection, privacy compliance