Unlocking the Power of Gemini API's Code Execution Feature

Unleash the power of Gemini API's code execution feature. Discover how it empowers developers to build applications with code-based reasoning, solve equations, and process text. Explore examples showcasing the capabilities of this innovative feature, from generating prime numbers to web scraping and machine learning model creation. Gain insights into the differences between code execution and function calling, and learn how to leverage this game-changing tool in your workflows.

February 20, 2025

party-gif

Unlock the power of code-based reasoning with the Gemini API's new Code Execution feature. Seamlessly integrate this capability into your applications, enabling them to generate, execute, and learn from code - solving complex problems with ease. Discover how this innovative feature can transform your development workflows and unlock new possibilities.

Powerful Capability: Code Execution on Gemini API

Google's Gemini API offers a unique feature called "code execution" that enables the model to generate and run Python code, and learn iteratively from the results until it arrives at the final output. This powerful capability allows developers to build applications that benefit from code-based reasoning, such as solving equations or processing text.

The key advantages of code execution over normal function calling are:

  1. Simplicity: The language model decides whether it needs to write code to perform a certain operation, and it can run the code in the API backend. This is much simpler to use compared to setting up a development environment and making multiple API calls.

  2. Flexibility: With code execution, the model can iterate on the code and refine the output, whereas function calling is limited to a single API request.

  3. Isolation: The code execution happens in a completely isolated environment, which means developers don't need to worry about the underlying infrastructure.

However, there are some limitations to the code execution feature:

  • It is currently limited to Python and a specific set of libraries (NumPy and SciPy).
  • It cannot return artifacts like media files or handle non-text output (e.g., data plots).
  • The code execution is limited to a maximum of 30 seconds, which may not be suitable for all use cases.

Despite these limitations, the code execution feature can be extremely powerful for developers, especially when building agents with Gemini 1.5 Flash or Pro models. The examples provided in the transcript demonstrate how the model can generate and execute code to solve various problems, including mathematical calculations, string manipulations, data analysis, web scraping, and even machine learning model training.

By leveraging the code execution capability, developers can create more sophisticated and capable applications that can benefit from the model's reasoning and iterative problem-solving abilities.

Understanding Code Execution vs. Function Calling

The key differences between code execution and normal function calling in the context of large language models (LLMs) like Gemini are:

  1. Code Execution:

    • The LLM can generate and execute code directly within the API backend.
    • The model decides whether it needs to write code to perform a certain operation and can run the code.
    • It's a single API request, and the code execution happens in the backend, allowing the model to iterate on the solution.
    • Currently limited to Python and specific libraries like NumPy and SciPy.
    • Has limitations such as no file I/O, non-text output, and a 30-second execution time limit.
  2. Function Calling:

    • Allows interaction with real-world APIs or tools using external functions.
    • Requires providing a list of tools the model can access and setting up the development environment.
    • May need to make multiple API calls to achieve a task.
    • Provides more flexibility in terms of language, framework, and functionality.
    • Requires more setup and management of the external environment.

Google recommends using code execution if the task can be performed within the provided capabilities, as it is simpler to use and doesn't require managing the external environment. However, function calling offers more flexibility when the task requires access to external resources or functionality not available in the code execution environment.

Exploring Code Execution Examples

Google's Gemini API offers a powerful feature called "code execution" that allows developers to generate and run Python code within the API. This capability enables building applications that benefit from code-based reasoning, such as solving equations or processing text.

Let's explore some examples of how this feature can be utilized:

Simple Mathematics

The Gemini API can generate and execute code to perform basic mathematical operations, such as calculating the sum of the first 200 prime numbers.

# Generate and execute code to calculate the sum
result = """
import math

primes = []
num = 2
while len(primes) < 200:
    is_prime = True
    for i in range(2, int(math.sqrt(num)) + 1):
        if num % i == 0:
            is_prime = False
            break
    if is_prime:
        primes.append(num)
    num += 1

total_sum = sum(primes)
print(f"The sum of the first 200 prime numbers is: {total_sum}")
"""

The output shows the calculated sum of the first 200 prime numbers.

String Manipulation

The Gemini API can also generate and execute code to perform various string manipulation tasks, such as converting a string to uppercase, counting the number of "o" characters, and reversing the string.

# Generate and execute code for string manipulation
result = """
text = "hello world, welcome to Gemini API"

# Convert to uppercase
upper_text = text.upper()
print(f"Uppercase text: {upper_text}")

# Count the number of 'o' characters
o_count = text.count('o')
print(f"Number of 'o' characters: {o_count}")

# Reverse the string
reversed_text = text[::-1]
print(f"Reversed text: {reversed_text}")
"""

The output shows the results of the string manipulation tasks.

Data Analysis

The Gemini API can generate and execute code to perform basic data analysis tasks, such as generating random numbers, calculating statistics (mean, median, mode), and creating a histogram.

# Generate and execute code for data analysis
result = """
import numpy as np
import matplotlib.pyplot as plt

# Generate random numbers between 100 and 1000
numbers = np.random.randint(100, 1001, size=1000)

# Calculate statistics
mean = np.mean(numbers)
median = np.median(numbers)
mode = stats.mode(numbers)[0]
min_value = np.min(numbers)
max_value = np.max(numbers)
total_sum = np.sum(numbers)

print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Mode: {mode}")
print(f"Minimum: {min_value}")
print(f"Maximum: {max_value}")
print(f"Sum: {total_sum}")

# Create a histogram
plt.hist(numbers, bins=30)
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram of Random Numbers")
plt.show()
"""

The output includes the calculated statistics and a histogram of the generated random numbers.

These examples demonstrate the versatility of the Gemini API's code execution feature, allowing developers to leverage the model's capabilities to solve a wide range of problems efficiently.

Generating Plots and Running ML Models

The Gemini API's code execution feature allows developers to not only generate code, but also execute it within the API's backend. This capability extends beyond simple mathematical operations or string manipulations, enabling the generation of data visualizations and the training of machine learning models.

When testing the code execution feature, the example prompts included a request to create a histogram plot. While the API was able to generate the necessary Python code to produce the plot, it was not able to directly return the plot artifact. However, the generated code can be executed locally, allowing the developer to generate the desired visualization.

Similarly, the API demonstrated the ability to generate synthetic data, split it into training and test sets, create and train a linear regression model, and evaluate the model's performance on the test set. Again, the API returned the Python code to accomplish these tasks, which the developer can then run locally to obtain the final results.

These examples showcase the versatility of the Gemini API's code execution feature. Developers can leverage this capability to build applications that require advanced data processing, visualization, and machine learning capabilities, without the need to manage the underlying infrastructure or set up complex development environments. The API handles the code generation and execution, allowing developers to focus on the high-level problem-solving and application design.

Pricing and Limitations of Gemini API Code Execution

The Gemini API offers a free tier for developers to explore the code execution feature. However, there are some limitations to keep in mind:

  • Request Limits: The free tier has limits on the number of requests you can make per minute. This is to prevent abuse and ensure fair usage of the API.

  • Execution Time: The code execution feature has a maximum runtime of 30 seconds. Any code that takes longer than that will time out.

  • Supported Libraries: The code execution environment has access to a limited set of libraries, primarily NumPy and SciPy. More complex libraries or custom packages are not supported.

  • No File I/O or Non-Text Output: The code execution feature does not support file I/O operations or generating non-text output like media files. This means you cannot use it for tasks that require these capabilities.

  • Potential Impact on Other Features: Enabling code execution can sometimes have a negative impact on the performance or quality of other model outputs, such as generating stories or essays. This is something to keep in mind when using the feature.

For developers who need more flexibility or higher usage limits, Google offers paid tiers of the Gemini API. These tiers provide increased request limits, longer execution times, and access to a broader set of libraries and capabilities.

Overall, the code execution feature in the Gemini API can be a powerful tool for developers, but it's important to understand its limitations and pricing structure to ensure it fits your use case and budget.

FAQ