Trusted Digital Transformation Partner

Building Production AI Agents with Python: LangGraph, Tool Calling & Memory Architecture

Article By : Rostan Team
Jan 24, 2024

Share This:

LLM "chat" is table stakes. The real engineering challenge in 2025 is building autonomous agents that plan multi-step actions, call external tools, maintain state across sessions, and recover from errors — without a human in the loop. This post digs into LangGraph, OpenAI tool calling, and the memory patterns needed for production-grade Python AI agents.

Why LangGraph Over Simple Chains

LangChain Expression Language (LCEL) is great for linear pipelines but falls apart when you need cycles — an agent that checks a result, decides to retry, calls a different tool, then loops back. LangGraph models the agent as a directed graph with state, which maps exactly to how real agents behave.

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    tool_calls: list
    iteration: int

graph = StateGraph(AgentState)
graph.add_node("llm_call",    call_llm)
graph.add_node("tool_exec",   execute_tools)
graph.add_node("summarise",   final_summary)

graph.add_conditional_edges("llm_call", route_decision, {
    "tools":  "tool_exec",
    "finish": "summarise",
    "retry":  "llm_call"
})
graph.add_edge("tool_exec", "llm_call")
graph.set_entry_point("llm_call")

Tool Calling Architecture

OpenAI-compatible tool calling (used by GPT-4o, Claude, Gemini) passes a JSON schema to the model. The model returns a tool_call object — your code executes it, appends the result as a tool role message, and re-calls the model. The key production concern is tool idempotency — always design tools that are safe to call twice:

tools = [
    {
        "type": "function",
        "function": {
            "name": "query_oracle_db",
            "description": "Run a read-only SQL query on the Oracle ERP database",
            "parameters": {
                "type": "object",
                "properties": {
                    "sql": {"type": "string", "description": "SELECT statement only"}
                },
                "required": ["sql"]
            }
        }
    }
]

# Always validate before execution
def query_oracle_db(sql: str) -> dict:
    assert sql.strip().upper().startswith("SELECT"), "Only SELECT allowed"
    with cx_Oracle.connect(DSN) as conn:
        cur = conn.cursor()
        cur.execute(sql)
        return {"rows": cur.fetchmany(50), "columns": [d[0] for d in cur.description]}

Memory Architecture for Long-Running Agents

There are three memory tiers every production agent needs:

In-context (working memory): The current message list. Prune aggressively — keep last 10 turns + system prompt + tool results. Token cost scales linearly.
External short-term (episodic): Redis or Oracle NoSQL. Store the last 20 tool results keyed by session ID. Agent retrieves relevant entries via a summary query.
Long-term (semantic): Oracle Vector Store or Chroma. Embed past sessions, retrieve top-5 similar by cosine similarity before each LLM call. This gives the agent "experience".

# Episodic memory retrieval pattern
from oracle_vector import VectorStore

vs = VectorStore(dsn=ORACLE_DSN, table="agent_memory")
query_embedding = embed(current_task_description)
relevant_memories = vs.similarity_search(query_embedding, k=5, threshold=0.82)
system_prompt += format_memories(relevant_memories)

Error Recovery & Guardrails

Agents fail silently in production. Implement: (1) max_iterations guard, (2) tool timeout wrapper, (3) LLM output validator using Pydantic models, (4) fallback model (GPT-4o-mini if GPT-4o times out). Log every tool call and LLM response to Oracle ADW for audit and fine-tuning dataset creation.