Teaching Large Language Models (LLMs) to do Math Correctly

When we use large language models, we often find they struggle with math. But there's a way to help them get better at it. We can create an AI agent and give it access to code that does mathematical evaluations. This way, we can make sure the math answers are correct. Let's look at a piece of code that does this and explain each part. Setting Up the Code First, we need to bring in some tools that will help our AI agent do math. We use these lines to get the tools ready: import { generateText, tool } from "ai"; import { createOpenAI } from "@ai-sdk/openai"; import { z } from "zod"; import { evaluate } from "mathjs"; These lines bring in the functions and libraries we need. The generateText and tool functions come from the "ai" package. We use createOpenAI from "@ai-sdk/openai" to connect to the OpenAI service. The z function comes from "zod", which helps us define what kind of data we expect. And evaluate from "mathjs" is what we use to do the actual math. To use these, you need to install them. You can do this by running this command in your terminal: npm install ai @ai-sdk/openai zod mathjs Connecting to OpenAI Next, we set up our connection to OpenAI. We do this with this piece of code: const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY, }); This code creates a connection to OpenAI using an API key. The API key is stored in an environment variable called OPENAI_API_KEY. This is a secure way to use the key without putting it directly in the code. Creating the Math Tool Now, we create a tool that will do the math for us. We define it like this: const math = tool({ description: "A tool for evaluating mathematical expressions", parameters: z.object({ expression: z .string() .describe( "The mathematical expression to evaluate in format supported by mathjs" ), }), execute: async (params) => { const result = evaluate(params.expression, { number: "BigNumber" }); return result.toString(); }, }); This tool is called math. It has a description that says it's for evaluating math expressions. The parameters part tells us what kind of data the tool needs. In this case, it needs a string called expression, which is the math problem we want to solve. The execute part is where the tool does its work. It uses the evaluate function from mathjs to solve the math problem. The result is then turned into a string and returned. Using the Math Tool Finally, we use the math tool in a function called main. Here's how it works: async function main() { const result = await generateText({ model: openai("gpt-4o"), system: "You are a math expert. When you are asked to evaluate a mathematical expression, use the math tool to evaluate it. Finally, once you have the result, return it as a string.", tools: { math }, maxSteps: 10, prompt: "Calculate the volume of a cylinder with a radius of 5 meters and a height of 10 meters.", }); console.log(`Output: ${result.text}`); } main(); In this function, we tell the AI to use the gpt-4o model and act as a math expert. We give it a system message that tells it to use the math tool when it needs to solve a math problem. We also give it a prompt, which is a math problem to solve. The AI will use the math tool to solve the problem and return the result as a string. We then print this result to the console. Output When we run this code, we get the following output: Output: The volume of the cylinder is approximately 785.4 cubic meters. By using this code, we can make sure that our AI agent does math correctly. It uses a tool that guarantees the right answers, which is very helpful when working with LLMs.

Jan 15, 2025 - 21:08
Teaching Large Language Models (LLMs) to do Math Correctly

When we use large language models, we often find they struggle with math. But there's a way to help them get better at it. We can create an AI agent and give it access to code that does mathematical evaluations. This way, we can make sure the math answers are correct. Let's look at a piece of code that does this and explain each part.

Setting Up the Code

First, we need to bring in some tools that will help our AI agent do math. We use these lines to get the tools ready:

import { generateText, tool } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
import { z } from "zod";
import { evaluate } from "mathjs";

These lines bring in the functions and libraries we need. The generateText and tool functions come from the "ai" package. We use createOpenAI from "@ai-sdk/openai" to connect to the OpenAI service. The z function comes from "zod", which helps us define what kind of data we expect. And evaluate from "mathjs" is what we use to do the actual math.

To use these, you need to install them. You can do this by running this command in your terminal:

npm install ai @ai-sdk/openai zod mathjs

Connecting to OpenAI

Next, we set up our connection to OpenAI. We do this with this piece of code:

const openai = createOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

This code creates a connection to OpenAI using an API key. The API key is stored in an environment variable called OPENAI_API_KEY. This is a secure way to use the key without putting it directly in the code.

Creating the Math Tool

Now, we create a tool that will do the math for us. We define it like this:

const math = tool({
  description: "A tool for evaluating mathematical expressions",
  parameters: z.object({
    expression: z
      .string()
      .describe(
        "The mathematical expression to evaluate in format supported by mathjs"
      ),
  }),
  execute: async (params) => {
    const result = evaluate(params.expression, { number: "BigNumber" });
    return result.toString();
  },
});

This tool is called math. It has a description that says it's for evaluating math expressions. The parameters part tells us what kind of data the tool needs. In this case, it needs a string called expression, which is the math problem we want to solve.

The execute part is where the tool does its work. It uses the evaluate function from mathjs to solve the math problem. The result is then turned into a string and returned.

Using the Math Tool

Finally, we use the math tool in a function called main. Here's how it works:

async function main() {
  const result = await generateText({
    model: openai("gpt-4o"),
    system:
      "You are a math expert. When you are asked to evaluate a mathematical expression, use the math tool to evaluate it. Finally, once you have the result, return it as a string.",
    tools: { math },
    maxSteps: 10,
    prompt:
      "Calculate the volume of a cylinder with a radius of 5 meters and a height of 10 meters.",
  });

  console.log(`Output: ${result.text}`);
}

main();

In this function, we tell the AI to use the gpt-4o model and act as a math expert. We give it a system message that tells it to use the math tool when it needs to solve a math problem. We also give it a prompt, which is a math problem to solve.

The AI will use the math tool to solve the problem and return the result as a string. We then print this result to the console.

Output

When we run this code, we get the following output:

Output: The volume of the cylinder is approximately 785.4 cubic meters.

By using this code, we can make sure that our AI agent does math correctly. It uses a tool that guarantees the right answers, which is very helpful when working with LLMs.