Pangram verdict · v3.3
We believe that this document is a mix of AI-generated, and human-written content
AI likelihood · overall
MixedArticle text · 1,577 words · 6 segments analyzed
11 min read2 days ago--In the previous part of the Build A Basic AI Agent From Scratch series, we added the essential tools to our agent to allow it to work autonomously for us. We gave it the ability to find files, read and write files, run bash commands and get content from the web. We got a very capable agent with just these tools.What happens when the agent runs long and complex tasks?The current agent works very well, but we want our agent to get a lot of work done, and this requires staying on the task for long spans of time. Right now, if we try to give our agent long and complex tasks we will find that it does not think long term, and it stops working after the littlest progress.This is to be expected because the LLM is trained to behave conversationally. It expects to go back and forth in a question-answer basis. This is fine for a simple chatbot, but our agent needs to be able to get a request and work for a long time on it before returning a result.Long task planningThe next ability we will give to our agent is the ability to plan for long and complex tasks.The abilities our agent needs are:Understand the goal of the taskPlan how to tackle the task beforehandBreak the task into concrete stepsKeep track of pending, in progress and completed tasksIf something goes wrong with the current plan, rethink the approachCheck that everything planned is actually done before stoppingTo give our agent these abilities, we will rely on the last part’s addition: tools. We will also explain the model how to use long task planning in the model’s system prompt.New tool: ScratchpadThis is a very simple but powerful tool. We are just giving the model a place to write it’s thoughts and read them again at a later time.The main benefit of this tool is that it forces the model to think through the goal and plan the whole approach before starting working on it.The tool saves the scratchpad content into memory instead of a file or database, which is fine because we don’t want to share the scratchpad content between sessions.
Here’s the python implementation:class Scratchpad: """Read and write from a in-memory scratchpad""" def __init__(self): self._content = "" def read(self) -> str: if self._content == "": return "(empty)" return self._content def write(self, content: str) -> str: self._content = str(content).strip() return self._contentscratchpad = Scratchpad()def read_scratchpad(): """Read the contents of the scratchpad""" return scratchpad.read()def write_scratchpad(content: str): """ Write into the scratchpad. The previous content will be overwritten. """ scratchpad.write(content) return "Successfully written content into scratchpad"You can find and clone this code in this blog series' <a href="https://github.com/rogiia/basic-agent-harness" target="_blank">Github repo</a>.New tool: To-do listA to-do list allows the agent to decompose the work into tasks and keep track of them to know what’s left to do (pending), what it’s working on currently (in progress) and what is already done (done).This tool also enforces some good practices: it doesn’t allow multiple tasks to be in progress at the same time, it doesn’t allow invalid task statuses and it doesn’t allow repeated tasks.Just like the scratchpad, this tool saves the to do list into memory instead of a file or database. This is also fine because we don’t want to share the to-do list between agent sessions.RETRY_LIMIT = 3class ToDoList: """Helper class to hold a to-do list in memory""" statuses = ["pending", "in_progress", "done", "cancelled", "failed"] def __init__(self): self._items = [] def read(self, include_completed=False): """Read the to-do list""" if include_completed: return [item.copy() for item in self._items] else: return [item.copy() for item in self._items if item["status"] != "done" and item["status"] !
= "cancelled"] def append(self, id, content, status): if status not in ToDoList.statuses: raise Exception(f"Invalid status {status}. " "Valid to-do statuses: pending, in_progress, done, " "cancelled, failed") if self.contains(id): raise Exception(f"To do item {id} already exists!") new_item = {"id": id, "content": content, "status": status, "retries": 0} self._items.append(new_item) return new_item.copy() def contains(self, id) -> bool: """Check if the to do list contains an item with a specific id""" for item in self._items: if item["id"] == id: return True return False def update(self, id, content, status): if status is not None and status not in ToDoList.statuses: raise Exception(f"Invalid status {status}. " "Valid to-do statuses: pending, in_progress, done, " "cancelled, failed") idx = 0 while idx < len(self._items): if self._items[idx]["id"] == id: if content is not None: self._items[idx]["content"] = content if status is not None: prev_status = self._items[idx]["status"] self._items[idx]["status"] = status # A failed task being set back to in_progress is a retry attempt. if prev_status == "failed" and status == "in_progress": self._items[idx]["retries"] += 1 return self._items[idx].copy() idx += 1 raise Exception(f"To do item with id {id} not found")todo_store = ToDoList()def todo_append(id, content, status) -> str: """Append a new to do item to the to do list""" id_str = str(id) content_str = str(content) status_str = str(status) try: todo_store.append(id_str, content_str, status_str) return f"Successfully appended to do item {id_str} in to do list!"
except Exception as e: return f"Failed to append to do item: {e}"def todo_list(include_completed=False) -> str: """List all the items in the to do list""" items = todo_store.read(include_completed) result = f"To Do List ({len(items)} items)\n" for status in ToDoList.statuses: count = sum(1 for i in items if i["status"] == status) result += f"{count} {status} items\n" result += "-----\n" for item in items: retry_note = f", {item['retries'] } retries" if item["retries"] > 0 else "" result += f"- [{item['id']}] {item['content'] } ({item['status']}{retry_note})\n" return resultdef todo_update(id, content=None, status=None) -> str: if content is None and status is None: return "No content or status was given to update. Nothing to do." try: item = todo_store.update(id, content, status) retries = item["retries"] if item["status"] == "in_progress" and retries > 0: if retries >= RETRY_LIMIT: return ( f"Updated to do item {id} to in_progress - " f"but this is retry {retries} of { RETRY_LIMIT} (retry limit reached). " f"Do not retry again. Escalate to the user instead." ) return ( f"Successfully updated to do item {id}! " f"Retry attempt {retries} of {RETRY_LIMIT}."
) return f"Successfully updated to do item {id}!" except Exception as e: return f"Failed to update to do item {id}: {e}"New system promptAll the strategies for long term task planning that cannot be implemented into tools are explained to the model in the system prompt. Here we will explain to the model how to plan using the process explained in the beginning of the article, and also how to use the new tools to help it in the planning process.For more details, read the system prompt below.I also added to the system prompt a little comment explaining to the model that if not stated otherwise, the project it has to work on is in the current directory.{ "role": "system", "content": ( "You are a capable coding and research assistant.\n\n" "## Available tools\n\n" "Action tools: read_file, write_file, edit_file, glob_files, grep, run_bash, webfetch\n\n" "Planning tools:\n" "- Scratchpad (read_scratchpad / write_scratchpad): your private working memory. " "Use it to think through an approach, store intermediate findings, or draft content " "before committing. Each write fully replaces the previous content.\n" "- To-do list (todo_append / todo_list / todo_update): a persistent task tracker. " "Items carry a status: pending, in_progress, done, cancelled, or failed.\n\n" "## Working directory\n\n" "The current working directory is always the user's project root. " "When asked to work on a project or codebase without a specified path, " "start by exploring '.' with glob_files or run_bash. " "Never ask the user to supply a path.\n\n" "## How to plan\n\n" "For complex or multi-step tasks (roughly 3 or more distinct steps, or when the " "path forward is unclear):\n" "1. Write your initial thinking and approach to the scratchpad before acting.\n" "2. Break the work into concrete steps and add each one to the to-do list with " "todo_append (status: pending).\n" "3. Before starting a step, mark it in_progress with todo_update. "
"Keep only one item in_progress at a time.\n" "4. Mark items done immediately after completing them - do not batch completions.\n" "5. Call todo_list to review remaining work before moving to the next step.\n" "6. Mark tasks cancelled if they become unnecessary.\n\n" "For simple, single-step tasks: act directly without creating todos.\n\n" "Planning tool calls (write_scratchpad, todo_append, todo_update, todo_list) " "are internal bookkeeping, not responses to the user. After any planning tool " "call, always continue working immediately - make your next tool call or, once " "the task is fully complete, give a substantive final answer. " "Never emit an empty or whitespace-only message.\n\n" "## Replanning\n\n" "After every tool result, check whether the outcome matched your expectation. " "If a tool returns an error, unexpected output, or reveals information that " "changes your understanding of the task, do not move to the next planned step - " "replan first.\n\n" "When a step fails:\n" "1. Diagnose in the scratchpad - is this a recoverable input error (wrong path, " "typo, wrong argument) or a deeper problem (wrong approach, wrong assumption)?\n" "2. Mark the task failed: todo_update(id, status='failed').\n" "3. Choose a recovery action:\n" " - Retry: the failure is correctable. Fix the input and set the task back to " "in_progress. The tool will report which retry attempt this is.\n" " - Replace: the approach is wrong. Cancel the task and add a revised one.\n" " - Reorder: new information makes a different task more urgent. Update the " "pending items before continuing.\n" "4. If todo_update reports that the retry limit has been reached, stop retrying. "