-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add comprehensive metric tracking and trajectory persistence #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add --with-finworld launch option for FinWorld service - Add --skip-check-avail-gpu flag to optionally bypass GPU checks - Update load_dotenv to not override existing environment variables - Add finance and API-related environment variables to runtime env - Improve PTY UTF-8 decoding error handling with 'replace' mode - Increase PTY launch wait time from 1800s to 3600s for stability - Add type hints for env_dict parameter in pty_wrapper
…ize with the actual environment query.
- Add msg_converter.py for bidirectional OpenAI<->AgentScope format conversion - Support tool_call_id in basic_tracker context serialization - Update multiagent_tracking to use msg_converter utilities - Update schema comments to English - Improve workflow_metadata documentation in base_tracker
- Add save_trajectory.py module with save_train_trajectory and save_eval_trajectory functions - Add save_trajectory config flag in ajet_default.yaml (default: False) - Integrate trajectory saving in trainer_verl.py for both training and evaluation - Extract and save reward_structure, workflow_metadata, and OpenAI-formatted trajectories
- Add tool_metric_helper.py to compute tool usage metrics (success rate, cache hit rate, error rate, execution time) - Add reward_metric_helper.py to compute reward distribution metrics - Integrate metric helpers into trainer_verl.py for both training and validation phases - Update general_runner.py to collect tool statistics from context tracker - Support SwanLab metric reporting with detailed per-tool breakdown
Summary of ChangesHello @TaoShuchang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the AgentJet training framework by adding comprehensive metric tracking, message format conversion, trajectory persistence, and improved environment support. These changes aim to provide better insights into agent behavior, improve interoperability, and enable offline analysis of training data. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant new features for metric tracking, trajectory persistence, and message format conversion. The changes are comprehensive and well-structured, particularly with the new helper modules for metrics. My review identified a couple of critical issues: a logic error in trainer_verl.py that swaps metric updates, and an incorrect import in general_runner.py that will cause a runtime error. I also found a high-severity issue in multiagent_tracking.py that could lead to an UnboundLocalError. Additional feedback includes suggestions to improve code quality by removing unused imports, moving local imports, and ensuring consistency in language for comments and docstrings.
ajet/backbone/trainer_verl.py
Outdated
| if tool_metrics: | ||
| val_metrics.update(reward_metrics) | ||
| if reward_metrics: | ||
| val_metrics.update(tool_metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a copy-paste error in the logic for updating val_metrics. The conditions are correct, but the values being updated are swapped. This will result in tool_metrics being updated with reward_metrics data and vice-versa, leading to incorrect metric reporting.
| if tool_metrics: | |
| val_metrics.update(reward_metrics) | |
| if reward_metrics: | |
| val_metrics.update(tool_metrics) | |
| if tool_metrics: | |
| val_metrics.update(tool_metrics) | |
| if reward_metrics: | |
| val_metrics.update(reward_metrics) |
| @@ -1,4 +1,5 @@ | |||
| import asyncio | |||
| from venv import logger | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) | ||
| ignore = True | ||
| break | ||
| extracted_tool_call_id = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable is_tool_result_msg is used on line 146 but is only assigned within the following for loop. If msg["content"] is an empty list, the loop will not execute, and an UnboundLocalError will be raised. Please initialize is_tool_result_msg to False before the loop to prevent this.
extracted_tool_call_id = ""
is_tool_result_msg = False
ajet/backbone/trainer_verl.py
Outdated
| from pprint import pprint | ||
| from typing import List, Optional | ||
| from ajet.utils.metric_helper.reward_metric_helper import compute_reward_metrics_from_cmts | ||
| from loguru import logger as loguru_logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ajet/backbone/trainer_verl.py
Outdated
| convert_grouped_steps_to_openai_format, | ||
| convert_ext_msg_to_openai_format, | ||
| agentscope_to_openai, | ||
| openai_to_agentscope, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from typing import List, Tuple, Union | ||
| from typing import List, Union, Tuple, Dict, Optional, Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typing module is imported twice, and the first import is now a subset of the second. To improve readability and avoid redundancy, please consolidate these into a single import statement.
| from typing import List, Tuple, Union | |
| from typing import List, Union, Tuple, Dict, Optional, Any | |
| from typing import List, Union, Tuple, Dict, Optional, Any |
| agentscope_to_openai, | ||
| openai_to_agentscope, | ||
| agentscope_to_openai_grouped, | ||
| openai_to_agentscope_grouped, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| msg["content"] = str_content | ||
| msg["tool_call_id"] = extracted_tool_call_id # Store extracted tool_call_id | ||
|
|
||
| # ★ 关键修复:如果是 tool_result 消息,将 role 恢复为 "tool"(OpenAI 格式) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """ | ||
| 将 grouped_steps 转换为 OpenAI 格式并返回。 | ||
| Returns: | ||
| OpenAI 格式的轨迹数据 (List of List of dict) | ||
| 每条消息格式如: | ||
| - {"role": "assistant", "content": "...", "tool_calls": [...]} | ||
| - {"role": "tool", "content": "...", "tool_call_id": "call_xxx"} | ||
| - {"role": "user/system", "content": "..."} | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring is in Chinese, while the rest of the file's documentation is in English. To maintain consistency and improve readability for all contributors, please translate it to English.
"""
Converts grouped_steps to OpenAI format and returns the result.
Returns:
Trajectory data in OpenAI format (List of List of dict).
Each message is formatted as follows:
- {"role": "assistant", "content": "...", "tool_calls": [...]}
- {"role": "tool", "content": "...", "tool_call_id": "call_xxx"}
- {"role": "user/system", "content": "..."}
"""| """ | ||
| 将当前 full_context 转换为 OpenAI 格式并返回。 | ||
| Returns: | ||
| OpenAI 格式的消息列表 (List of dict) | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ultiAgentContextTracker; add skip GPU check option in launcher
This PR introduces comprehensive metric tracking capabilities and improved environment support for the AgentJet training framework. The changes focus on three main areas: metric collection and reporting, message format conversion, and trajectory persistence.
Key Features
1. Metric Tracking System
tool_metric_helper.pyandreward_metric_helper.pyfor comprehensive statistics collectiontrainer_verl.pywith SwanLab reporting support2. Message Format Conversion
msg_converter.pyproviding bidirectional conversion between AgentScope and OpenAI message formats3. Trajectory Saving
save_trajectory.pyfor persisting training and evaluation trajectoriessave_trajectoryconfig option4. FinWorld Environment Support
ResourceKeeperto synchronize with actual environment queriescore_env_vars.pywith FinWorld-specific configurations5. Context Tracker Enhancements
workflow_metadatain context trackers to store tool statisticsgeneral_runner.pyfrom context trackersFiles Changed
trainer_verl.py,general_runner.pybase_tracker.py,basic_tracker.py,multiagent_tracking.pymsg_converter.py,save_trajectory.py,tool_metric_helper.py,reward_metric_helper.pyajet_default.yaml,core_env_vars.pyresource_keeper.py,pty.py,launcher.pyStatistics