A small Python pattern for safer LLM tool calls

The biggest single source of bugs I see in agent systems is hallucinated tool calls. The model invokes a function with parameters it invented. An ID that doesn’t exist. A date in the wrong format. A file path that isn’t real. The function does its best, returns something weird, and the agent loop continues on bad state.

Here is the pattern I use to catch most of these before they cause damage. Wrap every tool call in a validator that checks the inputs against a schema before executing. Pydantic works well:

from typing import Any, Callable
from pydantic import BaseModel, ValidationError

class GetItemParams(BaseModel):
    item_id: str
    include_details: bool = False

def safe_call(
    func: Callable[..., Any],
    params_model: type[BaseModel],
    raw_params: dict[str, Any],
) -> dict[str, Any]:
    try:
        validated = params_model(**raw_params)
    except ValidationError as e:
        return {"ok": False, "error": "invalid_parameters", "details": e.errors()}

    try:
        result = func(**validated.model_dump())
    except Exception as e:  # broad on purpose: keep the agent loop alive
        return {"ok": False, "error": "execution_failed", "details": str(e)}

    return {"ok": True, "result": result}


# Wire your real tool up like this:
def get_item(item_id: str, include_details: bool = False):
    ...  # your implementation

response = safe_call(get_item, GetItemParams, {"item_id": "X123"})

Three small things this gives you.

First, when the model sends bad parameters, it gets a structured error back instead of a stack trace or a silent failure. The error is clear about what was wrong. Models correct themselves on the next turn when you give them readable feedback.

Second, the agent loop sees a consistent response shape from every tool. Every call returns either {ok: true, result: ...} or {ok: false, error: ..., details: ...}. The loop logic doesn’t need to know about each tool individually.

Third, you can log validation failures separately from execution failures. After a week or two, you have a clear picture of which tools the model misuses most often. That’s where you focus your tool descriptions in the prompt, which reduces the failures at the source.

The bigger fix for hallucinated calls is better tool descriptions to begin with. But this wrapper catches the residual ones, and the structured error feedback genuinely improves model behavior on retries.

Nothing fancy. Just a wrapper. It probably saves me an hour of debugging per week.