feat: Add comprehensive metric tracking and trajectory persistence #1

TaoShuchang · 2026-01-09T14:59:51Z

This PR introduces comprehensive metric tracking capabilities and improved environment support for the AgentJet training framework. The changes focus on three main areas: metric collection and reporting, message format conversion, and trajectory persistence.

Key Features

1. Metric Tracking System

Add tool_metric_helper.py and reward_metric_helper.py for comprehensive statistics collection
Track tool usage metrics including:
- Success rate, cache hit rate, and error rate
- Per-tool execution time statistics (mean, max, count)
- Per-tool cache and error breakdowns
Track reward distribution metrics for training and validation phases
Integrate metrics into trainer_verl.py with SwanLab reporting support

2. Message Format Conversion

Add msg_converter.py providing bidirectional conversion between AgentScope and OpenAI message formats
Support grouped steps to OpenAI format conversion
Improve tool call handling in message conversions
Enable better interoperability between different agent frameworks

3. Trajectory Saving

Add save_trajectory.py for persisting training and evaluation trajectories
Support configurable trajectory saving via save_trajectory config option
Save trajectories in structured format for offline analysis

4. FinWorld Environment Support

Add FinWorld service configuration and environment variable support
Update ResourceKeeper to synchronize with actual environment queries
Improve PTY stability for environment interactions
Extend core_env_vars.py with FinWorld-specific configurations

5. Context Tracker Enhancements

Initialize workflow_metadata in context trackers to store tool statistics
Collect tool stats in general_runner.py from context trackers
Enable real-time tool usage monitoring during training

Files Changed

Core Training: trainer_verl.py, general_runner.py
Context Tracking: base_tracker.py, basic_tracker.py, multiagent_tracking.py
Utilities: msg_converter.py, save_trajectory.py, tool_metric_helper.py, reward_metric_helper.py
Configuration: ajet_default.yaml, core_env_vars.py
Environment: resource_keeper.py, pty.py, launcher.py

Statistics

15 files changed: 974 insertions, 31 deletions
6 commits from dev/shuchang branch

- Add --with-finworld launch option for FinWorld service - Add --skip-check-avail-gpu flag to optionally bypass GPU checks - Update load_dotenv to not override existing environment variables - Add finance and API-related environment variables to runtime env - Improve PTY UTF-8 decoding error handling with 'replace' mode - Increase PTY launch wait time from 1800s to 3600s for stability - Add type hints for env_dict parameter in pty_wrapper

…ize with the actual environment query.

…tistics

- Add msg_converter.py for bidirectional OpenAI<->AgentScope format conversion - Support tool_call_id in basic_tracker context serialization - Update multiagent_tracking to use msg_converter utilities - Update schema comments to English - Improve workflow_metadata documentation in base_tracker

- Add save_trajectory.py module with save_train_trajectory and save_eval_trajectory functions - Add save_trajectory config flag in ajet_default.yaml (default: False) - Integrate trajectory saving in trainer_verl.py for both training and evaluation - Extract and save reward_structure, workflow_metadata, and OpenAI-formatted trajectories

- Add tool_metric_helper.py to compute tool usage metrics (success rate, cache hit rate, error rate, execution time) - Add reward_metric_helper.py to compute reward distribution metrics - Integrate metric helpers into trainer_verl.py for both training and validation phases - Update general_runner.py to collect tool statistics from context tracker - Support SwanLab metric reporting with detailed per-tool breakdown

gemini-code-assist · 2026-01-09T15:01:29Z

Summary of Changes

Hello @TaoShuchang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the AgentJet training framework by adding comprehensive metric tracking, message format conversion, trajectory persistence, and improved environment support. These changes aim to provide better insights into agent behavior, improve interoperability, and enable offline analysis of training data.

Highlights

Metric Tracking System: Introduces tool_metric_helper.py and reward_metric_helper.py for comprehensive statistics collection, tracking tool usage and reward distribution metrics, integrated into trainer_verl.py with SwanLab reporting support.
Message Format Conversion: Adds msg_converter.py for bidirectional conversion between AgentScope and OpenAI message formats, improving tool call handling and interoperability between different agent frameworks.
Trajectory Saving: Includes save_trajectory.py for persisting training and evaluation trajectories in a structured format, configurable via the save_trajectory option.
FinWorld Environment Support: Adds FinWorld service configuration and environment variable support, updating ResourceKeeper for environment synchronization and improving PTY stability.
Context Tracker Enhancements: Initializes workflow_metadata in context trackers to store tool statistics, collecting tool stats in general_runner.py for real-time tool usage monitoring during training.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant new features for metric tracking, trajectory persistence, and message format conversion. The changes are comprehensive and well-structured, particularly with the new helper modules for metrics. My review identified a couple of critical issues: a logic error in trainer_verl.py that swaps metric updates, and an incorrect import in general_runner.py that will cause a runtime error. I also found a high-severity issue in multiagent_tracking.py that could lead to an UnboundLocalError. Additional feedback includes suggestions to improve code quality by removing unused imports, moving local imports, and ensuring consistency in language for comments and docstrings.

gemini-code-assist · 2026-01-09T15:03:48Z

ajet/backbone/trainer_verl.py

+        if tool_metrics:
+            val_metrics.update(reward_metrics)
+        if reward_metrics:
+            val_metrics.update(tool_metrics)


There is a copy-paste error in the logic for updating val_metrics. The conditions are correct, but the values being updated are swapped. This will result in tool_metrics being updated with reward_metrics data and vice-versa, leading to incorrect metric reporting.

Suggested change

if tool_metrics:

val_metrics.update(reward_metrics)

if reward_metrics:

val_metrics.update(tool_metrics)

if tool_metrics:

val_metrics.update(tool_metrics)

if reward_metrics:

val_metrics.update(reward_metrics)

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/task_runner/general_runner.py

@@ -1,4 +1,5 @@
 import asyncio
+from venv import logger


This import is incorrect and will cause a critical ImportError at runtime. The venv module is for creating Python virtual environments and does not contain a logger. You likely intended to import from loguru.

Suggested change

from venv import logger

from loguru import logger

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/context_tracker/multiagent_tracking.py

-                            )
-                            ignore = True
-                            break
+                extracted_tool_call_id = ""


The variable is_tool_result_msg is used on line 146 but is only assigned within the following for loop. If msg["content"] is an empty list, the loop will not execute, and an UnboundLocalError will be raised. Please initialize is_tool_result_msg to False before the loop to prevent this.

extracted_tool_call_id = "" is_tool_result_msg = False

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/backbone/trainer_verl.py

 from pprint import pprint
 from typing import List, Optional
+from ajet.utils.metric_helper.reward_metric_helper import compute_reward_metrics_from_cmts
+from loguru import logger as loguru_logger


This import of loguru_logger is redundant as logger is already imported from loguru on line 27 and this alias is not used. Please remove it to keep the imports clean.

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/backbone/trainer_verl.py

+    convert_grouped_steps_to_openai_format,
+    convert_ext_msg_to_openai_format,
+    agentscope_to_openai,
+    openai_to_agentscope,


These functions from msg_converter are imported but appear to be unused in this file. Please remove the unused imports to improve code clarity.

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/context_tracker/base_tracker.py

 from typing import List, Tuple, Union
+from typing import List, Union, Tuple, Dict, Optional, Any


The typing module is imported twice, and the first import is now a subset of the second. To improve readability and avoid redundancy, please consolidate these into a single import statement.

Suggested change

from typing import List, Tuple, Union

from typing import List, Union, Tuple, Dict, Optional, Any

from typing import List, Union, Tuple, Dict, Optional, Any

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/context_tracker/multiagent_tracking.py

+    agentscope_to_openai,
+    openai_to_agentscope,
+    agentscope_to_openai_grouped,
+    openai_to_agentscope_grouped,


These functions (agentscope_to_openai, openai_to_agentscope, agentscope_to_openai_grouped, openai_to_agentscope_grouped) are imported but do not appear to be used within this file. Please remove them to keep the import section clean.

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/context_tracker/multiagent_tracking.py

+                msg["content"] = str_content
+                msg["tool_call_id"] = extracted_tool_call_id  # Store extracted tool_call_id
+
+                # ★ 关键修复：如果是 tool_result 消息，将 role 恢复为 "tool"（OpenAI 格式）


This comment is in Chinese, which is inconsistent with the English comments in the rest of the codebase. For maintainability, please translate it to English or remove it if the code is self-explanatory.

# Critical fix: If this is a tool_result message, restore the role to "tool" (OpenAI format).

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/context_tracker/multiagent_tracking.py

+        """
+        将 grouped_steps 转换为 OpenAI 格式并返回。
+        
+        Returns:
+            OpenAI 格式的轨迹数据 (List of List of dict)
+            每条消息格式如：
+            - {"role": "assistant", "content": "...", "tool_calls": [...]}
+            - {"role": "tool", "content": "...", "tool_call_id": "call_xxx"}
+            - {"role": "user/system", "content": "..."}
+        """


This docstring is in Chinese, while the rest of the file's documentation is in English. To maintain consistency and improve readability for all contributors, please translate it to English.

""" Converts grouped_steps to OpenAI format and returns the result. Returns: Trajectory data in OpenAI format (List of List of dict). Each message is formatted as follows: - {"role": "assistant", "content": "...", "tool_calls": [...]} - {"role": "tool", "content": "...", "tool_call_id": "call_xxx"} - {"role": "user/system", "content": "..."} """

gemini-code-assist · 2026-01-09T15:03:49Z

ajet/context_tracker/multiagent_tracking.py

+        """
+        将当前 full_context 转换为 OpenAI 格式并返回。
+        
+        Returns:
+            OpenAI 格式的消息列表 (List of dict)
+        """


This docstring is in Chinese, which is inconsistent with the English documentation in the rest of the file. Please translate it to English for consistency.

""" Converts the current full_context to OpenAI format and returns the result. Returns: A list of messages in OpenAI format (List of dict). """

…ultiAgentContextTracker; add skip GPU check option in launcher

TaoShuchang added 6 commits January 9, 2026 17:44

chore(finworld): Updated the main query in ResourceKeeper to synchron…

96b5071

…ize with the actual environment query.

feat(context_tracker): Initialize workflow_metadata to store tool sta…

697bc5b

…tistics

TaoShuchang requested a review from binary-husky January 9, 2026 14:59

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

binary-husky force-pushed the main branch from e89152a to 6722e2d Compare January 12, 2026 06:18

binary-husky and others added 6 commits January 13, 2026 15:04

Merge branch 'main' into dev/shuchang

b548324

refactor: clean up unused imports and add text extraction method in M…

836e380

…ultiAgentContextTracker; add skip GPU check option in launcher

revise commit

5977031

refactor(utils): remove meesage trasformation between OpenAI, AgentScope

c10e96a

amend config gap

a7dc96d

update benchmark test robot

69ca4d8

binary-husky merged commit a23a867 into main Jan 13, 2026
0 of 2 checks passed

		from typing import List, Tuple, Union
		from typing import List, Union, Tuple, Dict, Optional, Any

feat: Add comprehensive metric tracking and trajectory persistence #1

feat: Add comprehensive metric tracking and trajectory persistence #1

Uh oh!

Conversation

TaoShuchang commented Jan 9, 2026

Key Features

1. Metric Tracking System

2. Message Format Conversion

3. Trajectory Saving

4. FinWorld Environment Support

5. Context Tracker Enhancements

Files Changed

Statistics

Uh oh!

gemini-code-assist bot commented Jan 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants