⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@TaoShuchang
Copy link
Collaborator

This PR introduces comprehensive metric tracking capabilities and improved environment support for the AgentJet training framework. The changes focus on three main areas: metric collection and reporting, message format conversion, and trajectory persistence.

Key Features

1. Metric Tracking System

  • Add tool_metric_helper.py and reward_metric_helper.py for comprehensive statistics collection
  • Track tool usage metrics including:
    • Success rate, cache hit rate, and error rate
    • Per-tool execution time statistics (mean, max, count)
    • Per-tool cache and error breakdowns
  • Track reward distribution metrics for training and validation phases
  • Integrate metrics into trainer_verl.py with SwanLab reporting support

2. Message Format Conversion

  • Add msg_converter.py providing bidirectional conversion between AgentScope and OpenAI message formats
  • Support grouped steps to OpenAI format conversion
  • Improve tool call handling in message conversions
  • Enable better interoperability between different agent frameworks

3. Trajectory Saving

  • Add save_trajectory.py for persisting training and evaluation trajectories
  • Support configurable trajectory saving via save_trajectory config option
  • Save trajectories in structured format for offline analysis

4. FinWorld Environment Support

  • Add FinWorld service configuration and environment variable support
  • Update ResourceKeeper to synchronize with actual environment queries
  • Improve PTY stability for environment interactions
  • Extend core_env_vars.py with FinWorld-specific configurations

5. Context Tracker Enhancements

  • Initialize workflow_metadata in context trackers to store tool statistics
  • Collect tool stats in general_runner.py from context trackers
  • Enable real-time tool usage monitoring during training

Files Changed

  • Core Training: trainer_verl.py, general_runner.py
  • Context Tracking: base_tracker.py, basic_tracker.py, multiagent_tracking.py
  • Utilities: msg_converter.py, save_trajectory.py, tool_metric_helper.py, reward_metric_helper.py
  • Configuration: ajet_default.yaml, core_env_vars.py
  • Environment: resource_keeper.py, pty.py, launcher.py

Statistics

  • 15 files changed: 974 insertions, 31 deletions
  • 6 commits from dev/shuchang branch

- Add --with-finworld launch option for FinWorld service
- Add --skip-check-avail-gpu flag to optionally bypass GPU checks
- Update load_dotenv to not override existing environment variables
- Add finance and API-related environment variables to runtime env
- Improve PTY UTF-8 decoding error handling with 'replace' mode
- Increase PTY launch wait time from 1800s to 3600s for stability
- Add type hints for env_dict parameter in pty_wrapper
- Add msg_converter.py for bidirectional OpenAI<->AgentScope format conversion
- Support tool_call_id in basic_tracker context serialization
- Update multiagent_tracking to use msg_converter utilities
- Update schema comments to English
- Improve workflow_metadata documentation in base_tracker
- Add save_trajectory.py module with save_train_trajectory and save_eval_trajectory functions
- Add save_trajectory config flag in ajet_default.yaml (default: False)
- Integrate trajectory saving in trainer_verl.py for both training and evaluation
- Extract and save reward_structure, workflow_metadata, and OpenAI-formatted trajectories
- Add tool_metric_helper.py to compute tool usage metrics (success rate, cache hit rate, error rate, execution time)
- Add reward_metric_helper.py to compute reward distribution metrics
- Integrate metric helpers into trainer_verl.py for both training and validation phases
- Update general_runner.py to collect tool statistics from context tracker
- Support SwanLab metric reporting with detailed per-tool breakdown
@gemini-code-assist
Copy link

Summary of Changes

Hello @TaoShuchang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the AgentJet training framework by adding comprehensive metric tracking, message format conversion, trajectory persistence, and improved environment support. These changes aim to provide better insights into agent behavior, improve interoperability, and enable offline analysis of training data.

Highlights

  • Metric Tracking System: Introduces tool_metric_helper.py and reward_metric_helper.py for comprehensive statistics collection, tracking tool usage and reward distribution metrics, integrated into trainer_verl.py with SwanLab reporting support.
  • Message Format Conversion: Adds msg_converter.py for bidirectional conversion between AgentScope and OpenAI message formats, improving tool call handling and interoperability between different agent frameworks.
  • Trajectory Saving: Includes save_trajectory.py for persisting training and evaluation trajectories in a structured format, configurable via the save_trajectory option.
  • FinWorld Environment Support: Adds FinWorld service configuration and environment variable support, updating ResourceKeeper for environment synchronization and improving PTY stability.
  • Context Tracker Enhancements: Initializes workflow_metadata in context trackers to store tool statistics, collecting tool stats in general_runner.py for real-time tool usage monitoring during training.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new features for metric tracking, trajectory persistence, and message format conversion. The changes are comprehensive and well-structured, particularly with the new helper modules for metrics. My review identified a couple of critical issues: a logic error in trainer_verl.py that swaps metric updates, and an incorrect import in general_runner.py that will cause a runtime error. I also found a high-severity issue in multiagent_tracking.py that could lead to an UnboundLocalError. Additional feedback includes suggestions to improve code quality by removing unused imports, moving local imports, and ensuring consistency in language for comments and docstrings.

Comment on lines 1073 to 1076
if tool_metrics:
val_metrics.update(reward_metrics)
if reward_metrics:
val_metrics.update(tool_metrics)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a copy-paste error in the logic for updating val_metrics. The conditions are correct, but the values being updated are swapped. This will result in tool_metrics being updated with reward_metrics data and vice-versa, leading to incorrect metric reporting.

Suggested change
if tool_metrics:
val_metrics.update(reward_metrics)
if reward_metrics:
val_metrics.update(tool_metrics)
if tool_metrics:
val_metrics.update(tool_metrics)
if reward_metrics:
val_metrics.update(reward_metrics)

@@ -1,4 +1,5 @@
import asyncio
from venv import logger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This import is incorrect and will cause a critical ImportError at runtime. The venv module is for creating Python virtual environments and does not contain a logger. You likely intended to import from loguru.

Suggested change
from venv import logger
from loguru import logger

)
ignore = True
break
extracted_tool_call_id = ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The variable is_tool_result_msg is used on line 146 but is only assigned within the following for loop. If msg["content"] is an empty list, the loop will not execute, and an UnboundLocalError will be raised. Please initialize is_tool_result_msg to False before the loop to prevent this.

                extracted_tool_call_id = ""
                is_tool_result_msg = False

from pprint import pprint
from typing import List, Optional
from ajet.utils.metric_helper.reward_metric_helper import compute_reward_metrics_from_cmts
from loguru import logger as loguru_logger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This import of loguru_logger is redundant as logger is already imported from loguru on line 27 and this alias is not used. Please remove it to keep the imports clean.

Comment on lines 61 to 64
convert_grouped_steps_to_openai_format,
convert_ext_msg_to_openai_format,
agentscope_to_openai,
openai_to_agentscope,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These functions from msg_converter are imported but appear to be unused in this file. Please remove the unused imports to improve code clarity.

Comment on lines 1 to +2
from typing import List, Tuple, Union
from typing import List, Union, Tuple, Dict, Optional, Any

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The typing module is imported twice, and the first import is now a subset of the second. To improve readability and avoid redundancy, please consolidate these into a single import statement.

Suggested change
from typing import List, Tuple, Union
from typing import List, Union, Tuple, Dict, Optional, Any
from typing import List, Union, Tuple, Dict, Optional, Any

Comment on lines 27 to 30
agentscope_to_openai,
openai_to_agentscope,
agentscope_to_openai_grouped,
openai_to_agentscope_grouped,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These functions (agentscope_to_openai, openai_to_agentscope, agentscope_to_openai_grouped, openai_to_agentscope_grouped) are imported but do not appear to be used within this file. Please remove them to keep the import section clean.

msg["content"] = str_content
msg["tool_call_id"] = extracted_tool_call_id # Store extracted tool_call_id

# ★ 关键修复:如果是 tool_result 消息,将 role 恢复为 "tool"(OpenAI 格式)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is in Chinese, which is inconsistent with the English comments in the rest of the codebase. For maintainability, please translate it to English or remove it if the code is self-explanatory.

                # Critical fix: If this is a tool_result message, restore the role to "tool" (OpenAI format).

Comment on lines 610 to 619
"""
将 grouped_steps 转换为 OpenAI 格式并返回。
Returns:
OpenAI 格式的轨迹数据 (List of List of dict)
每条消息格式如:
- {"role": "assistant", "content": "...", "tool_calls": [...]}
- {"role": "tool", "content": "...", "tool_call_id": "call_xxx"}
- {"role": "user/system", "content": "..."}
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This docstring is in Chinese, while the rest of the file's documentation is in English. To maintain consistency and improve readability for all contributors, please translate it to English.

        """
        Converts grouped_steps to OpenAI format and returns the result.
        
        Returns:
            Trajectory data in OpenAI format (List of List of dict).
            Each message is formatted as follows:
            - {"role": "assistant", "content": "...", "tool_calls": [...]}
            - {"role": "tool", "content": "...", "tool_call_id": "call_xxx"}
            - {"role": "user/system", "content": "..."}
        """

Comment on lines 623 to 628
"""
将当前 full_context 转换为 OpenAI 格式并返回。
Returns:
OpenAI 格式的消息列表 (List of dict)
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This docstring is in Chinese, which is inconsistent with the English documentation in the rest of the file. Please translate it to English for consistency.

        """
        Converts the current full_context to OpenAI format and returns the result.
        
        Returns:
            A list of messages in OpenAI format (List of dict).
        """

@binary-husky binary-husky merged commit a23a867 into main Jan 13, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants