Agent Inspector (Debug your agent and LLM actions) - Assembly of Agents

This is a submission for the Agent.ai Challenge: Assembly of Agents (See Details) What I Built Agent Inspector is a must-have tool for those building and iterating on agents on Agent.ai. Agent Inspector provides you information beyond what Agent.AI's built in debugger shows. It tells you what format the output of an agent should be, validates if the agent is in-fact returning that output, assesses toxicity of the agent's output. Provides a clear overall Pass/Fail result. If there's opportunity to improve the prompt, or provide more data to the agent, it makes suggestions. The agent has become useful enough that I use every time I create a new agent or am iterating on an existing one. Without giving away all the secrets publicly here are some of the things that have gone into it: LLM actions with specific models chosen for special skills they have. Invoking agents. I created another utility agent that simply grabs the current time and date and returns it in a useful way for agents. The current date and time agent uses a serverless function to grab the date and time and return a JSON response providing that information in a way that is easily understood by LLMs to better handle time related requests. This modular way of building allowed me to both help others build agents that need to utilize time and date, as well as power the functionality for this agent. The Debug Agent is also intended to be used by other agents - that is the primary way to use it. To build the Agent Inspector and ensure quality in all of it's tests I actually had to build an agent for testing it. Multiple prompt engineering techniques. The agent does hit a serverless function within the previously mentioned current date and time agent I created that it invokes. Working around some issues found in the if statement action and the inability to export multiple variables to other agents - This agent actually has a companion JSON variant that returns the data in JSON so you can take automated action based on the test results. Tests performed by the Agent Inspector: Expected Data type of output based on prompt Validation that the output is actually matching the prompt with confidence score. Relevancy of output to prompt provided - with confidence score. Likelihood of hallucination - validates LLM has all needed information to provide answer. - with confidence score. Toxicity - is the response using offensive or harmful language. (This takes into account the prompt and does not simply flag simply using 1 "bad" word as toxic, so if the prompt is talking about a subject in an academic sense it is not going to say the response is toxic unless it goes too far.) Fluffy/substance test of text answer - Checks the response for fluffy content that doesn't have much substance. Think of it like this: if it's supposed to be a blog post - is this something someone's going to in turn just pass to an LLM to summarize because it's not concise and meaty enough? Suggested prompt revisions - based on all of the tests the agent suggests improvements to encourage better results from your LLM action or agent. Execution timing - know how long your agent takes to execute from start to finish. Provides some best practices and norms. Why it meets this criteria This agent relies on another agent for a step in it's process. Demo Watch a video showing Agent Inspector as well as instructions to set it up Agent.ai Experience Overall I've enjoyed the experience, and want to push the platform to do things it was never designed to do. I'm providing feedback on a lot of areas where the platform can be improved - understand that my feedback is coming from a place of appreciating it for what it is, and just wanting to see it further thrive. Being a developer I'm always going to want more, but there are limitations I ran into that prevent me from being able to deliver even more with this agent and other agents. Feedback for the Agent.ai team.

Jan 22, 2025 - 21:50
 0
Agent Inspector (Debug your agent and LLM actions) - Assembly of Agents

This is a submission for the Agent.ai Challenge: Assembly of Agents (See Details)

What I Built

Agent Inspector is a must-have tool for those building and iterating on agents on Agent.ai. Agent Inspector provides you information beyond what Agent.AI's built in debugger shows. It tells you what format the output of an agent should be, validates if the agent is in-fact returning that output, assesses toxicity of the agent's output. Provides a clear overall Pass/Fail result. If there's opportunity to improve the prompt, or provide more data to the agent, it makes suggestions.

The agent has become useful enough that I use every time I create a new agent or am iterating on an existing one.

Without giving away all the secrets publicly here are some of the things that have gone into it:

  • LLM actions with specific models chosen for special skills they have.
  • Invoking agents. I created another utility agent that simply grabs the current time and date and returns it in a useful way for agents. The current date and time agent uses a serverless function to grab the date and time and return a JSON response providing that information in a way that is easily understood by LLMs to better handle time related requests. This modular way of building allowed me to both help others build agents that need to utilize time and date, as well as power the functionality for this agent. The Debug Agent is also intended to be used by other agents - that is the primary way to use it.
  • To build the Agent Inspector and ensure quality in all of it's tests I actually had to build an agent for testing it.
  • Multiple prompt engineering techniques.
  • The agent does hit a serverless function within the previously mentioned current date and time agent I created that it invokes.
  • Working around some issues found in the if statement action and the inability to export multiple variables to other agents - This agent actually has a companion JSON variant that returns the data in JSON so you can take automated action based on the test results.

Tests performed by the Agent Inspector:

  • Expected Data type of output based on prompt
  • Validation that the output is actually matching the prompt with confidence score.
  • Relevancy of output to prompt provided - with confidence score.
  • Likelihood of hallucination - validates LLM has all needed information to provide answer. - with confidence score.
  • Toxicity - is the response using offensive or harmful language. (This takes into account the prompt and does not simply flag simply using 1 "bad" word as toxic, so if the prompt is talking about a subject in an academic sense it is not going to say the response is toxic unless it goes too far.)
  • Fluffy/substance test of text answer - Checks the response for fluffy content that doesn't have much substance. Think of it like this: if it's supposed to be a blog post - is this something someone's going to in turn just pass to an LLM to summarize because it's not concise and meaty enough?
  • Suggested prompt revisions - based on all of the tests the agent suggests improvements to encourage better results from your LLM action or agent.
  • Execution timing - know how long your agent takes to execute from start to finish. Provides some best practices and norms.

Why it meets this criteria

This agent relies on another agent for a step in it's process.

Demo

Watch a video showing Agent Inspector as well as instructions to set it up

Agent.ai Experience

Overall I've enjoyed the experience, and want to push the platform to do things it was never designed to do. I'm providing feedback on a lot of areas where the platform can be improved - understand that my feedback is coming from a place of appreciating it for what it is, and just wanting to see it further thrive. Being a developer I'm always going to want more, but there are limitations I ran into that prevent me from being able to deliver even more with this agent and other agents.

Feedback for the Agent.ai team.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow