Pangram verdict · v3.3
We believe that this document is a mix of AI-generated, and human-written content
AI likelihood · overall
MixedArticle text · 1,585 words · 6 segments analyzed
There is an implicit contract at the foundation of every database architecture decision you have ever made. You probably never wrote it down. Nobody does. It just… existed.The contract goes something like this: the caller is a human-authored application, running deterministic code, issuing predictable queries, reviewed by a developer before deployment. Writes are intentional. Connections are brief. When something goes wrong, a human notices. The database can be dumb and fast because the application layer is smart and careful.For forty years, this contract held. It shaped how we designed schemas, sized connection pools, granted permissions, and thought about failure modes. It worked because the assumption was correct.It is no longer correct. Agentic AI systems violate this contract at every layer simultaneously.In this article, I break down exactly which assumptions are failing, why they matter, and what to do about it - with concrete patterns and code. Let’s dig right in…Assumption - Deterministic CallerIn every application you have deployed before agents, the queries hitting your database were authored by a human. developer wrote the SQL developer code-reviewed it developer tested it and deployed it. This assumption runs so deep that the tooling reflects it automatically: the Postgres query planner builds statistics around observed query patterns, caching layers warm up on repeated queries, and connection pools are tuned around the expected number of concurrent queries of a known complexity.Agents work differently; they reason their way to queries. Different reasoning paths produce different queries against the same tables.An agent working on a customer analytics task might issue a join across five tables that has never been issued before, hold the connection while it thinks about the result, then issue a completely different follow-up. Your indexes cover the happy path. Your connection pool is sized for your observed peak. Neither of those holds when the agent can build any query depending on the data it needs.Statement TimeoutsStatement timeouts are your first line of defense. A human-authored query that takes 30 seconds is a bug that someone will notice. An agent query that takes 30 seconds might be a reasoning loop that no one is watching.So, set timeouts at the role level, not just the application level.CREATE ROLE agent_worker; ALTER ROLE agent_worker SET statement_timeout = '5s'; ALTER ROLE agent_worker SET idle_in_transaction_session_timeout = '10s';The idle_in_transaction_session_timeout is especially important.
Agents that pause mid-reasoning while holding an open transaction could be a legitimate situation.Assumption - Writes are IntentionalThe most dangerous assumption in database architecture is that every write was reviewed by a human before it happened. This was basically true for your entire career, but not anymore.Agents write autonomously. They write based on their current understanding of the task, which may be wrong. Agents write in loops when their tools return unexpected results. Agents write on retries when a transient network error makes them ‘think’ the first attempt failed. Agents can even write thousands of rows in the time it takes you to get a Slack notification that something looks off.Here’s a real documented failure pattern - an agent calling a legacy API receives HTTP 200 with an empty result set. The API failed silently because the database connection pool was exhausted downstream. The agent interprets “no data” as “no problem” and proceeds to process 500 transactions with incomplete data. No exception was raised. No alert fired. The log showed “decision: approved” on every record.The core fix here is to design your write paths assuming the caller might be wrong, might retry, and might not be watching the results.Soft Deletes EverywhereNever let an agent hard-delete anything. Use soft deletes as a baseline for any table an agent can write toALTER TABLE orders ADD COLUMN deleted_at TIMESTAMPTZ; ALTER TABLE orders ADD COLUMN deleted_by TEXT; -- 'agent:customer-support-v2', 'user:abc123' ALTER TABLE orders ADD COLUMN delete_reason TEXT;
-- Agents query this view; they never see deleted rows and can't accidentally undelete CREATE VIEW active_orders AS SELECT * FROM orders WHERE deleted_at IS NULL;The deleted_by column is more important than it looks. When you are debugging what happened two hours ago, “show me everything agent X deleted” is a query you will want to run.Append-only Event LogsFor operations where the stakes are higher - financial records, inventory changes, user state mutations - consider going further and making the table append-only.
The agent never issues UPDATE or DELETE. It issues INSERT with a new state and a reason:CREATE TABLE order_state_log ( id UUID DEFAULT gen_random_uuid() PRIMARY KEY, order_id UUID NOT NULL REFERENCES orders(id), previous_status TEXT, new_status TEXT NOT NULL, changed_by TEXT NOT NULL, changed_at TIMESTAMPTZ DEFAULT now(), reason TEXT, idempotency_key TEXT UNIQUE );This is the event sourcing pattern applied at the table level. A single append-only log table for your most sensitive entities gives you a complete audit trail and makes “undo” a projection query.Idempotency Keys Are Not OptionalAgents retry, and this is by design. Every orchestration framework operates on at-least-once delivery semantics. If a step fails, it runs again. Your write paths need to be designed for this.An idempotency key is a stable identifier that an agent includes with every write. The database rejects duplicates silently with a unique constraint. The agent gets a successful response either way.
Running the operation twice produces the same result as running it once.-- The agent generates this key from -- task_id + operation_type + target_id -- It is deterministic for the same logical -- operation, so retries produce the same key ALTER TABLE order_state_log ADD CONSTRAINT uq_idempotency_key UNIQUE (idempotency_key);In practice, the agent constructs the key like this:import hashlib
def make_idempotency_key(task_id: str, operation: str, target_id: str) -> str: raw = f"{task_id}:{operation}:{target_id}" return hashlib.sha256(raw.encode()).hexdigest()[:32]The task ID comes from the orchestration layer and is stable across retries of the same logical task. This means the agent can retry as many times as it needs to, and your database sees exactly one write per logical operation.Assumption - Connections are BriefTraditional connection pool sizing follows a straightforward mental model. Your application handles N concurrent requests. Each request needs one database connection for a brief period. You size your pool to slightly above your expected concurrency peak, add a little headroom, and you are done.Agents break this model in three ways. Agents hold connections longer A multi-step reasoning task may issue a query, pause to process the result with the LLM, issue another query, pause again, and repeat. Each pause holds the connection open. The connection time per task is no longer “query execution time” - it is “query execution time + LLM inference time x reasoning steps.” Agents fan out A single high-level agent task often spawns sub-agents to work in parallel.
One task becomes five simultaneous database sessions. This can exhaust connections when concurrent agent workflows holding db.session open across long IO waits until Postgres ran out of connection slots. Agents multiply unexpectedly In development, you had three agents. In production, you have thirty. Nobody updated the connection pool configuration.The fix is a dedicated connection pool for agent workloads, sized independently from your human-facing transactional application traffic# Rule of thumb: (num_agent_workers * avg_concurrent_steps * 0.5) # The 0.5 accounts for the fact that most agent steps # involve LLM time, not DB time
agent_engine = create_engine( DATABASE_URL, pool_size=10, # base pool for agents max_overflow=5, # burst capacity pool_timeout=3, # fail fast rather than queue pool_recycle=300, # recycle connections every 5 minutes pool_pre_ping=True, # validate connections before checkout connect_args={ "options": "-c statement_timeout=5000 -c idle_in_transaction_session_timeout=10000" } )The pool_timeout=3 is deliberate. When an agent cannot get a connection within 3 seconds, it should fail fast and retry with backoff, not queue indefinitely. Queued requests under a saturated pool is how you get cascading failures.For systems running many agents concurrently, add PgBouncer between your agents and Postgres. PgBouncer operates in transaction pooling mode, which means it returns a connection to the pool immediately after each transaction rather than holding it for the entire session. This is a significant multiplier on your effective connection capacity for agentic workloads.#
pgbouncer.ini [databases] mydb = host=postgres_host dbname=mydb
[pgbouncer] pool_mode = transaction # critical: release connection after each transaction max_client_conn = 500 # clients (agents) can connect up to this number default_pool_size = 20 # actual postgres connections (much smaller) reserve_pool_size = 5 # emergency capacity reserve_pool_timeout = 1.0 # fail fast if reserve is also exhaustedIn transaction pooling mode, 20 actual Postgres connections can serve 500 agent connections, because each agent only holds a Postgres connection for the duration of a single transaction, not the entire multi-step task.Assumption - Bad Queries Fail LoudlyIn a human-operated system, a slow or incorrect query surfaces quickly. The dashboard loads slowly. The API times out. An engineer runs EXPLAIN ANALYZE and finds the problem. The feedback loop is tight.Agents close that feedback loop. An agent that gets a slow query result just uses the result. An agent that gets an empty result set does not know whether the data genuinely does not exist or whether the query was wrong. It continues with its task, potentially writing decisions based on a bad read.This is a different class of failure from application errors. An exception is observable. A semantically wrong query that returns rows is not.The mitigation is building agent-specific observability into your database access layer. Standard slow query logs are not enough. You need to know which agent, which task, and which reasoning step produced a query. The most practical way to do this in Postgres is query commentsfrom sqlalchemy import text, event from sqlalchemy.engine import Engine
@event.listens_for(Engine, "before_cursor_execute") def add_agent_context_comment(conn, cursor, statement, parameters, context, executemany): agent_ctx = getattr(conn.info, "agent_context", None) if agent_ctx: statement = f"/* agent_id={agent_ctx['agent_id']}, task_id={agent_ctx['task_id']},