Tool Design
I’ve spent the last few months building a conversational agent focused on improving discovery and reducing the support burden. I know what you’re thinking… another chatbot is exactly what the world needed. You’re welcome.
In any case, it has been fun operating at the cutting edge of this AI summer, though development has been a rollercoaster, with traditional software engineering and all its rigour replaced with stochastic parrots and the prompt alchemy that feeds them. Like the agentic latency problems encountered, finding the right tool abstractions proved challenging and required several rounds of iteration. To help others who are exploring this space, I've collated some pithy guidelines for designing toolsets based on my (limited) experience and the (minimal) literature available.
Simple
Tools should be simple. Avoid cognitive leaps that make it difficult to know when and how to use them. Litmus test: could a human without any expertise use the tool effectively?
Prefer concise descriptions. Bloated descriptions are a smell, implying the tool is brittle and/or the underlying implementation has limitations.
Follow Postel’s Law and "be liberal in what you accept". Try to handle ambiguous or malformed input (but work to minimise this bad input over time), offloading as much complexity as possible to tool implementations. Any implicit actions can be communicated back to the agent, if useful, e.g. when defaults are applied or corrections are made. On hard failures, give the LLM a chance to self-remediate by providing descriptive errors.
Use deterministic processing instead of LLM generation where possible. For instance, validate referenced entities to mitigate hallucinations and use "agentic pointers", e.g. generate [CITATION_N] and [RESOURCE_X] tokens instead of URIs/content and resolve them in a postprocessing step.
Lean
The fewer tools, the better. Quality starts to degrade when there are too many options. To help with this, merge similar tools and keep tools semantically disjoint. Tool definitions should not overlap in order to prevent confusion and avoid redundant calls. If appropriate, raise the level of abstraction and group related functionality together, e.g. search_entities > search_entity_x + search_entity_y.
Err on the side of more context over more tools. Models are getting increasingly better at not getting "lost in the middle", so only split functionality out once you have a concrete reason to. Regardless, reducing tokens is always beneficial so consider adding a parameter to control output verbosity, e.g. response_format: "concise" | "detailed". This is particularly useful for data fetching tools and can be extended further by allowing the specification of exact output properties, à la GraphQL, for greater flexibility.
Do not mirror existing APIs by default. Roll up functionality rather than exposing the underlying interfaces, e.g. get_entity_context > get_entity_for_id + list_entity_children, in order to reduce the number of decision points and LLM round trips.
Loosely Coupled
If possible, avoid inter-tool dependencies. If tools must be called in sequence, try to merge them, e.g. tool_x_then_y. If dependencies are unavoidable, use namespacing to help delineate, e.g. entity_x_search + entity_x_lookup.
When referencing other tools, avoid exact names and prefer vague mentions ("use metadata tool"). In an ideal world, even the system prompt should be tool-agnostic in order to make things easier to change (ETC).
References
- Writing Effective Tools for Agents – Anthropic
- o3/o4-mini Function Calling Guide – OpenAI
- Best Practices for Defining Functions – OpenAI