Woter AI detection.Hurry - ends Jul 21st

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

Problems encountered when building code execution tools based on Function Calling

Written by

Silas Grey

Updated on:June-28th-2025

In the process of developing MolaGPT, I always believe that simple text conversation is just a manifestation of the AI model's capabilities. What is really exciting is that the model can actively understand the intention, make plans like humans, and execute complex instructions to solve complex problems, such as calling tools, generating charts, and analyzing data.

To achieve this, it is necessary to build a complete "toolbox" for the model, and the language model itself must be powerful enough to call the tools, and these capabilities require matching back-end interfaces and stable result outputs.

During this period, I finally had the opportunity to develop an Agent based on Function Calling (now it seems to be Tools Calling). My goal is very clear: first, to give the model the ability to search online; second, and more importantly, to enable the model to run Python code.

Hand-painted OpenAI's Function Calling

The core idea of my backend is inspired by OpenAI's Function Calling mechanism. The principle can be summarized as follows: when the model recognizes that the user's intention needs to be completed with the help of external tools, it no longer directly generates the final answer, but generates a structured JSON object. This JSON object accurately describes the name of the function to be called (such as execute_python_code) and the parameters required to execute the function (for example, a Python code string).

This structured request is then sent to my backend system. After receiving this request, the backend parses the JSON content, matches the predefined list of available functions ($available_functions) according to the name field, and finds the corresponding processing logic (for example, execute_python_code). After the backend completes the task, it returns the execution result (such as the standard output or error message of the code) to the model for a second request, and then the model combines the context to generate the final reply to the user.

For example, if the model determines that the Python code print(2+2) needs to be executed, it will send a JSON similar to the following to the backend:

{ "name": "execute_python_code", "arguments": { "code": "print(2+2)" }}

The backend parses the name as execute_python_code, extracts the code field "print(2+2)" in arguments, and then passes it to a Python interpreter in an isolated Docker sandbox environment for execution. After execution, its stdout is captured and the result "4" is returned to the model. This usually involves two interactions with the model: one is the model outputting the Function Call request, and the other is the backend returning the execution result for the model's reference.

For example, in my backend implementation, I defined two core functions for my model:

search_web: Used to search the Internet for the latest information.
execute_python_code: used to execute Python code, which is also the focus of this article.

This is the function description information defined by the backend, which is used to inform the model of the existence and usage of these two tools:

$available_functions = [ [ "type" => "function", "function" => [ "name" => "search_web", "description" => "Search the web for the latest information and authoritative content", "parameters" => [ "type" => "object", "properties" => [ "query" => [ "type" => "string", "description" => "The search query" ] ], "required" => ["query"] ] ] ], [ "type" => "function", "function" => [ "name" => "execute_python_code", "description" => "Execute a snippet of Python code and return the standard output", "parameters" => [ "type" => "object", "properties" => [ "code" => [ "type" => "string", "description" => "The Python code to execute, eg, print(2+2))" ] ], "required" => ["code"] ] ] ];

The model will intelligently determine whether these tools need to be called based on the conversation context, and automatically generate corresponding parameter requests for execution by the backend.

Several problems encountered in the process and their solutions

1. Models don’t always use print()

When I initially tested the execute_python_code function, I found that the code generated by the model often failed to execute. After checking the log, I found that the problem was that the code generated by the model often did not explicitly use the print() function to output the final result. The model may have treated the backend Python environment as an interactive environment such as Jupyter Notebook. For example, when I asked the model to calculate 2+2, it might generate the following code and request execution:

2  +  2
## orresult  =  2 + 2Result

In interactive environments such as Jupyter Notebook, this is indeed no problem, but my backend is a Docker environment that simply pulls the Python image and builds it manually. After the Python executor runs this code, it will fail because there is no print() statement, the standard output is empty. As a result, the model cannot receive the calculation result "4", causing it to think that the code execution failed.

1.1 Failed attempt: brute force concatenation of print()

My initial idea was simple: Is it possible to automatically add print(...)But I soon realized that this method was too crude. Not all code executions need to print the result of the last expression. For example, the code may just define functions, import libraries, etc., which does not really need to be printed. print()existence. Forced addition print() Might result in syntax errors or unexpected behavior.

1.2 Solution 1: Use AST to automatically complete print()

After some research and consulting with the big model, I decided to use AST (Abstract Syntax Tree) to handle this problem intelligently. AST can parse the code into a tree structure, where each node represents a syntax element in the program, such as an expression, statement, function definition, etc. The root of this tree is the entire module, and its child nodes may be assignment statements, function calls, conditional judgments, and other structures.

The main purpose of using AST is to accurately express the syntax and semantics of the code, ignoring non-essential content such as comments and blank lines. print().

import  ast
class  AutoPrintTransformer (ast.NodeTransformer):    """    AST transformer that automatically adds print() to the last expression    """    def  __init__ ( self ):        super ().__init__()        self.modified =  False
    def  visit_Module ( self, node ):        """        Process the module node and add print() to the last expression        """        self.generic_visit(node)
        last_expr_index = - 1        for  i  in  range ( len (node.body) -  1 , - 1 , - 1 ):            stmt = node.body[i]
            # Determine whether it is a print statement            is_print_stmt = (                isinstance (stmt, ast.Expr)  and                 isinstance (stmt.value, ast.Call)  and                 isinstance (stmt.value.func, ast.Name)  and  stmt.value.func. id  ==  'print'            )
            is_docstring = (                isinstance (stmt, ast.Expr)  and                 isinstance (stmt.value, ast.Constant)  and                 isinstance (stmt.value.value,  str )  and  i ==  0            )
            if  not  is_print_stmt  and  not  is_docstring:                last_expr_index = i                break
        # Found the expression that needs to be printed        if  last_expr_index != - 1 :            expr_node = node.body[last_expr_index]            print_call = ast.Call(func=ast.Name( id = 'print' , ctx=ast.Load()), args=[expr_node.value], keywords=[])            print_stmt = ast.Expr(value=print_call)            ast.copy_location(print_stmt, expr_node)            node.body[last_expr_index] = print_stmt            self.modified =  True
        return  node

The main execution flow of this code is:

Analysis
: Using Python's built-in ast Library, through ast.parse() Parse the code string generated by the model into an AST object.
Traversing the tree
: Traverse the top-level statement node list of AST (node.body).
Locate the last expression statement
: Find the last type ast.Expr The statement node.
Check if it is missing print()
: Determine whether this expression statement is already print() Call, or whether it is a string constant.
Package
: If it is a normal expression that needs to print the result, wrap its AST node in a new print() Function call node, and replace the original expression statement node.
Generate new code and execute
:use ast.unparse() Convert the modified AST back to a Python code string, or compile and execute it directly.

In this way, even if the model is not written print(), our backend can also find all the missing positions and fill them in dynamically print(), ensuring that the calculation results can be captured and returned correctly.

1.3 New problems encountered later

After going online for a while, I found that PHP had serious problems in handling quotes and line breaks in Python code. When Python code is directly inserted into Python template strings, the code often contains special characters (such as single quotes, double quotes, backslashes, etc.) or complex multi-line structures, which causes the string to be interrupted prematurely or syntax errors to occur. This was unexpected because I did not write Python directly in PHP code when I was doing PHP development before, so I did not expect this problem to occur.

To completely solve this problem, I decided to further optimize it: first Base64 encode the user's Python code in PHP, so as to ensure that the string passed to the Python environment is native and not interfered by special characters.

1.4 Current solution: Use Base64 to maintain code integrity, and then use AST to automatically complete print()

After receiving the Python script submitted by the model, the backend uses PHP's built-in function base64_encode Encode the Python code.

$b64 = base64_encode($user_python_code);

The Python side template no longer directly uses triple quotes to enclose the user script, but instead receives a Base64 string and then uses Python's built-in library base64.b64decode() Restore it to the original code and use the previously developed AST to autocomplete it print() method to process it.

import  base64, ast, traceback
class  AutoPrintTransformer (ast.NodeTransformer):    def  __init__ ( self ):        self.has_modified =  False
    def  visit_Module ( self, node ):        self.generic_visit(node)        last_expr =  None        for  i  in  range ( len (node.body) -  1 , - 1 , - 1 ):            stmt = node.body[i]            if  isinstance (stmt, ast.Expr)  and  not  (                isinstance (stmt.value, ast.Constant)  and  isinstance (stmt.value.value,  str )            ):                last_expr = (i, stmt)                break        if  last_expr:            idx, expr = last_expr            is_print = (                isinstance (expr.value, ast.Call)  and                isinstance (expr.value.func, ast.Name)  and                expr.value.func.id ==  '  print'            )            if  not  is_print:                print_node = ast.Expr(                    value = ast.Call(                        func=ast.Name( id = 'print' , ctx=ast.Load()),                        args=[expr.value],                        keywords=[]                    )                )                ast.copy_location(print_node, expr)                node.body[idx] = print_node                self.has_modified =  True        return  node
def  transform_code ( src ):    try :        tree = ast.parse(src)        transformer = AutoPrintTransformer()        new_tree = transformer.visit(tree)        ast.fix_missing_locations(new_tree)        if  transformer.has_modified:            exec ( compile (new_tree,  '<string>' ,  'exec' ),  globals ())        else :            exec (src,  globals ())    except  Exception  as  e:        print ( f"Error while processing code:  {e} " )        traceback.print_exc()        exec (src,  globals ())
# Decode from Base64 to recover the original user codeuser_code = base64.b64decode( '<Base64_encoded_string>' ).decode( 'utf-8' )transform_code(user_code)

After the above solution is put into use, the problem of accidental string truncation during the interaction between PHP and Python codes is avoided, and Python with complex line break structures can be executed correctly.

2. Matplotlib drawing results cannot be displayed

In order to enhance the code execution capability, I deliberately pre-made the matplotlib library when building the Docker image, expecting the model to generate data visualization charts. In the test, the model can indeed generate standard drawing codes (such as drawing sine waves), and plt.show() is also called, but after the code is executed, no images can be seen in the front-end chat interface. But think about it, this is right, because plt.show() will try to open a window in the graphical user interface (GUI) environment to display the image. But in the Docker sandbox, there is no display or window system, and plt.show() will silently fail or report an error in a pure headless environment.

Some people may think (this person is me): Can we let the model save the image in the prompt? But this brings up a new problem: even if the model is changed to call plt.savefig('plot.png') to save the image to a file, this file only exists in the container and cannot be directly accessed by the outside world. Moreover, once the code is finished running, the data will be lost after the container is destroyed.

2.1 Solution: After dynamically injecting the code, traverse the image and extract

Pre-code injection: Before executing the model code, inject a patch code into the header area to track all created Figure objects and set matplotlib to use a non-interactive backend:

import  osimport  uuidimport  matplotlibmatplotlib.use( 'Agg' )   # Use non-interactive backendimport  matplotlib.pyplot  as  plt
fig_list = []   # Used to keep track of all created figuresorig_figure = plt.figure   # Save the original plt.figure method
def  track_figure ( *args, **kwargs ):    """    Look for the resulting graph.    """    fig = orig_figure(*args, **kwargs)    fig_list.append(fig)    return  fig
plt.figure = track_figure

Post-code injection: After the code execution is finished, it traverses all image objects, saves them as PNG files, transparently transfers them to the local directory, and converts the path to an image link accessible to the public network:

...
saved_paths = []
if  'fig_list'  in  globals ()  and  fig_list:    output_dir =  "/output"   # output directory    os.makedirs(output_dir, exist_ok= True )    for  i, fig  in  enumerate (fig_list):        unique_id =  str (uuid.uuid4())[: 8 ]   # Generate unique ID        filename = os.path.join(output_dir,  f"chart_ {i} _ {unique_id} .png" )   # Construct file name        try :            fig.tight_layout()        except  Exception:            pass        try :            fig.savefig(filename, dpi= 100 , bbox_inches= 'tight' )   # Save image            saved_paths.append(filename)   # Add path to the list        except  Exception  as  e:            print ( "Error!!" )        finally :            plt.close(fig)   # Close the figure
if  saved_paths:    for  img_path  in  saved_paths:        print ( f"###IMAGE_FILE_PATH### {img_path} " )else :    print ( "No chart." )
plt.close( 'all'

Model processing logic: Use emphasis symbols ###IMAGE_FILE_PATH### to prompt the model that there are images in the current location, and use natural language that the model can understand to output charts in Markdown format:

![Chart 1](https://chatgpt.wljay.cn/sandbox/matplotlib/chart_xxxxx_12345.png)

After a series of processing, the image drawn by the model can be displayed to the user on the front end.

As shown in the figure above, the user asked the model to draw a regular tetrahedron. The model understood the user's intention and drew a beautiful image, which was successfully transmitted to the public network directory through the backend preprocessing logic.

3. Summary

Regardless of the temperature parameter setting, the model may still not use print(); or the matplotlib image does not display. At this time, the developer needs to design the corresponding pre-processing, execution and post-processing mechanisms in the backend, which can effectively make up for the limitations of the Agent development process and "clean up" the model.