Unleash Autonomous Agents with Qwen-Agent's Best Open-Source Models

Explore the power of Qwen-Agent's open-source models and learn how to unleash autonomous agents through function calling and custom agent creation. Discover the impact of quantization on model performance for real-world applications.

June 19, 2025

Unlock the power of autonomous agents with Qwen-Agent, the cutting-edge open-source framework that leverages the best open-weight models available. Discover how to seamlessly integrate function calling and agent-based workflows to build intelligent applications that can interact with the real world and adapt to user needs.

Build Autonomous Agents with The Best Open Weight Model
Function Calling and Agents: Understanding the Differences
Getting Started with Qwen Agents: Function Calling and Agent Usage
The Impact of Quantization on Large Language Model Performance

Build Autonomous Agents with The Best Open Weight Model

Alibaba's Quen 2 models are the latest and greatest open-source language models, offering impressive capabilities across a wide range of tasks. These models range from 500 million to 72 billion parameters, with the larger models supporting up to 128,000 tokens - a significant improvement over the 8,000 token limit of GPT-3.

One of the key features of Quen 2 is its strong performance on coding and mathematics, as well as its ability to handle long-context understanding - crucial for real-world applications. Additionally, the models offer support for a diverse set of languages, including a focus on Middle Eastern and Southeast Asian languages, which is a welcome change from the Western language-centric focus of many other models.

To leverage the power of Quen 2, we can use the Quen Agent framework, which provides access to a built-in browser assistant, code interpreter, and the ability to create custom assistants. This allows us to build autonomous agents that can plan, execute, and adapt their actions based on the task at hand.

In this section, we'll explore how to use Quen Agent to create a custom image generation agent. The agent will be able to generate images based on user input, download the generated images, and even update its own code if it encounters any issues. By combining the powerful language understanding of Quen 2 with the planning and execution capabilities of Quen Agent, we can create truly autonomous and capable agents that can tackle a wide range of tasks.

Function Calling and Agents: Understanding the Differences

Function calling and agents are two distinct concepts in the world of large language models (LLMs). Here's a concise explanation of the differences between the two:

Function Calling (Tool Usage):

Function calling, or tool usage, allows the LLM to interact with the external world by accessing external APIs or functions.
The LLM determines which function to use based on the user input, generates the necessary inputs for the function, and returns the results to the user.
However, the LLM itself cannot execute the function call; the user or a separate system must make the actual function call and return the results to the LLM.

Agents:

Agents are more sophisticated instances of LLMs that have access to a set of tools, just like in function calling.
Agents can also perform planning, decompose tasks into sub-goals, and execute actions using the available tools.
Agents have access to both short-term and long-term memory, allowing them to keep track of their progress and plan their next steps accordingly.
Agents are critical for making LLMs truly useful in real-world applications, as they can autonomously perform complex tasks.

In summary, function calling is a more limited interaction where the LLM can only generate the necessary inputs for a function, while agents have the ability to plan, execute, and adapt their actions to achieve a desired outcome.

Getting Started with Qwen Agents: Function Calling and Agent Usage

To get started with Qwen agents, we'll be using the 72 billion version of Qwen 2 and running it locally using AMA. You can also use their external API, but to use Qwen Agent, we have two options:

Install the package using pip as an independent Python package.
Clone the repo and run the installation locally if you want the latest development version.

I'll be running it locally using AMA. First, start an AMA server and use the olama Run Qwen 272 billion command. This will download AMA and the model, which may take some time depending on your internet speed.

Next, create a virtual environment using conda and activate it:

conda create -n qwen python=3.10
conda activate qwen

Now, install the Qwen Agent package using pip:

pip install qwen-agent

We'll start with function calling. The model needs to pick the function to use, figure out the inputs, and pass them to a Python interpreter. The interpreter will execute the function, get the response, and feed it back to the LLM.

Here's an example of function calling for getting the current weather:

# Create the LLM instance
llm = QwenLLM(base_api="http://localhost:8000/v1", model_name="qwen-2-72b", version="v1")

# User message
user_message = "What's the current weather like in Paris?"

# Define the function
function_call = {
    "description": "Get the current weather for a given location",
    "function": "get_weather",
    "arguments": {
        "location": "Paris",
        "unit": "celsius"
    }
}

# Call the LLM and execute the function
response = llm.call_with_function(user_message, [function_call])
print(response)

This will generate the current weather information for Paris and return it to the LLM.

Now, let's look at an example of using Qwen Agent. We'll create a custom agent that can generate images and download them to a local folder. The agent will use a custom tool for image generation and the code interpreter to execute the necessary code.

from qwen_agent import Assistant, CodeInterpreter, CustomTool

# Define the custom image generation tool
class MyImageGeneration(CustomTool):
    def __init__(self):
        self.description = "Generate images based on user input using the Pollinations.AI API."

    def run(self, args):
        # Generate the image using the API
        image_url = generate_image(args["prompt"])
        
        # Download the image to a local folder
        download_image(image_url, "images")

        return f"Image generated and saved to 'images' folder."

# Create the agent
agent = Assistant(
    llm=QwenLLM(base_api="http://localhost:8000/v1", model_name="qwen-2-72b", version="v1"),
    tools=[MyImageGeneration(), CodeInterpreter()]
)

# Ask the agent to generate an image
agent.run("Create an image of a llama wearing sunglasses.")

This agent will use the custom image generation tool to create the image, download it to the "images" folder, and return the result.

Finally, let's look at the impact of quantization on the performance of Qwen models. The Qwen team has provided detailed evaluations on MML, C-Evolve, and I-Evolve metrics for different quantization levels.

The results show that for larger models (72 billion), the difference between 16-bit and 8-bit quantization is not significant, but 4-bit quantization can have a more noticeable impact on performance. For smaller models (0.5 billion), the differences are more pronounced, with 4-bit quantization showing a 5-point drop in average scores.

In general, it's recommended to use at least 8-bit or 16-bit quantization for production deployments, as 4-bit quantization can have a more significant impact on model performance, especially for smaller models.

The Impact of Quantization on Large Language Model Performance

In this section, we will explore the impact of quantization on the performance of large language models, specifically focusing on the Quin 2 models.

The Quin team has addressed the question of quantization impact in their updated models. They have evaluated the performance of Quin 2 models with different quantization levels, including floating-point 16, 8-bit, 4-bit quantization, and AWQ (Adaptive Weight Quantization).

Based on the average numbers, the following trends can be observed:

Larger Models: For the larger 72 billion parameter model, the performance difference between 16-bit and 8-bit quantization is not significant, with around 81% of the original performance maintained. However, the 4-bit quantization shows a more noticeable drop in performance.
Smaller Models: For the smaller models, such as the 0.5 billion parameter version, the impact of quantization is more dramatic. The 4-bit quantization shows a 5-point difference in the average score compared to the original floating-point 16 version.

The key takeaway is that the impact of quantization is more pronounced on smaller models, while the larger Quin 2 models can maintain relatively high performance even with 8-bit quantization.

When deploying these models in production, it is generally recommended to use at least 8-bit quantization, and 16-bit if possible, to avoid significant performance degradation, especially for the smaller model variants.

In summary, the Quin team's analysis highlights the importance of carefully considering the trade-offs between model size, quantization level, and performance when deploying large language models in real-world applications.

FAQ

What is the difference between function calling and agents?

What are the key features of the Qwen-2 models?

How can you use function calling with Qwen?

How can you create custom agents with Qwen Agent?

How does quantization impact the performance of Qwen-2 models?

Create Your AI Girlfriend

Create and chat with your dream AI Girlfriend