[5525939] Allow user to select target opset in MOQ; upgrade onnxruntime #809

galagam · 2026-01-22T19:12:45Z

What does this PR do?

Type of change: new feature

Overview:

Allow user to select the target opset
Minimum opset will be defined according to quantization mode
Upgrade onnxruntime to 1.23.0 to support nvfp4
Add tests in tests/unit/onnx/test_quantize_api.py

Testing

Added unit tests
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int8] PASSED [ 11%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[fp8] PASSED [ 22%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_minimum_upgrades_to_minimum[int4] PASSED [ 33%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int8] PASSED [ 44%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[fp8] PASSED [ 55%]
tests/unit/onnx/test_quantize_api.py::test_opset_below_original_uses_original[int4] PASSED [ 66%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int8] PASSED [ 77%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[fp8] PASSED [ 88%]
tests/unit/onnx/test_quantize_api.py::test_opset_above_minimum[int4] PASSED [100%]

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: Yes - auto update according to argparser help
Did you update Changelog?: Yes

Additional Information

Requested as a WAR for a Windows-onnxruntime issue in 5525939, but regardless, it's a useful feature to have

Summary by CodeRabbit

New Features
- Added --opset CLI option enabling users to specify target ONNX opset version when quantizing models.
- Automatic validation ensures the opset version is compatible with quantization requirements, with warnings when adjustments are made.
Tests
- Added comprehensive test coverage for opset version handling across quantization workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Allow user to select the target opset - Minimum opset will be defined according to quantization mode - Upgrade onnxruntime to 1.23.0 to support nvfp4 - Add tests in tests/unit/onnx/test_quantize_api.py Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

coderabbitai · 2026-01-22T19:13:07Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This change introduces opset version control to ONNX quantization, allowing users to specify target ONNX opset versions via CLI --opset flag. The system automatically determines minimum required opset based on quantization precision requirements and validates or upgrades user-specified values accordingly.

Changes

Cohort / File(s)	Summary
CLI Interface `modelopt/onnx/quantization/__main__.py`	Added `--opset` argument to parser with help text describing target opset behavior; wired argument into `quantize()` function call.
Configuration `setup.py`	Upgraded optional onnxruntime dependency pins from v1.22.0 to v1.23.0 for both standard and GPU variants.
Documentation `CHANGELOG.rst`	Added changelog entry for new `--opset` CLI option.
Opset Utilities `modelopt/onnx/utils.py`	Introduced `BASE_MIN_OPSET` constant (value: 19), `QDQ_PRECISION_MIN_OPSET` mapping, `get_qdq_precisions()` function to detect Q/DQ precision types from model, and `get_min_opset_for_precisions()` to compute maximum required opset for a given set of precisions.
FP16/BF16 Conversion `modelopt/onnx/autocast/convert.py`	Added `opset` parameter to `convert_to_f16()` signature; implemented dynamic min_opset computation based on precision requirements; replaced hardcoded min_opset (21) with computed value; added validation and auto-upgrade logic with warnings.
Quantization Core `modelopt/onnx/quantization/quantize.py`	Added `opset` parameter to `quantize()` and `_preprocess_onnx()` functions; implemented opset validation, minimum requirement checking, and automatic upgrading logic; threads opset through preprocessing and quantization paths; converts model to target opset via `onnx.version_converter` when needed.
FP8 Quantization `modelopt/onnx/quantization/fp8.py`	Added `opset` parameter to `quantize()` signature; propagates opset to `convert_to_f16()` call.
INT8 Quantization `modelopt/onnx/quantization/int8.py`	Added `opset` parameter to `quantize()` signature; propagates opset to both `quantize_static()` and `convert_to_f16()` calls.
Test Coverage `tests/unit/onnx/test_quantize_api.py`	Added three parameterized test functions (`test_opset_below_minimum_upgrades_to_minimum`, `test_opset_below_original_uses_original`, `test_opset_above_minimum`) validating opset handling across quantization modes (int8, fp8, int4); includes `MIN_OPSET` mapping defining per-mode minimums.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Quantizer as quantize()
    participant Preprocessor as _preprocess_onnx()
    participant OpsetUtil as Opset Utils
    participant Converter as convert_to_f16()
    participant Model as ONNX Model

    User->>Quantizer: quantize(model, opset=X)
    Quantizer->>Preprocessor: _preprocess_onnx(model, opset=X)
    
    Preprocessor->>OpsetUtil: get_qdq_precisions(model)
    OpsetUtil-->>Preprocessor: precision_set
    
    Preprocessor->>OpsetUtil: get_min_opset_for_precisions(precision_set)
    OpsetUtil-->>Preprocessor: mode_min_opset
    
    alt opset provided
        Preprocessor->>Preprocessor: validate opset vs mode_min_opset
        alt opset < mode_min_opset
            Preprocessor->>Preprocessor: warn & upgrade to mode_min_opset
        end
    else opset not provided
        Preprocessor->>Preprocessor: select max(original_opset, mode_min_opset)
    end
    
    Preprocessor->>Model: convert to target_opset
    Preprocessor-->>Quantizer: preprocessed_model
    
    Quantizer->>Converter: convert_to_f16(model, opset=target_opset)
    Converter->>OpsetUtil: get_qdq_precisions(model)
    Converter->>OpsetUtil: get_min_opset_for_precisions(precision_set)
    Converter->>Converter: compute min_opset with validation
    Converter-->>Quantizer: converted_model
    
    Quantizer-->>User: quantized_model

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically describes the main changes: allowing users to select target opset in MOQ and upgrading onnxruntime.
Docstring Coverage	✅ Passed	Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/onnx/quantization/quantize.py (1)
440-521: Pass the effective target opset downstream.

_preprocess_onnx may upgrade the model's opset via onnx.version_converter.convert_version(), but quantize_int8 and quantize_fp8 receive the raw user opset input. This creates a mismatch: the quantizers' opset parameter may diverge from the actual model opset after preprocessing. Pass get_opset_version(onnx_model) (already imported) instead.
🔧 Suggested fix
     ) = _preprocess_onnx(
         onnx_path,
         use_external_data_format,
         output_path,
@@
         simplify,
         quantize_mode,
         opset,
     )
+    target_opset = get_opset_version(onnx_model)
@@
-            opset=opset,
+            opset=target_opset,
             **kwargs,
         )

🤖 Fix all issues with AI agents

In `@modelopt/onnx/autocast/convert.py`:
- Around line 222-235: The docstring for the conversion routine is out of sync
with the implementation: low_precision_type uses a base_min_opset of 19 for
"fp16" (see base_min_opset and low_precision_type in convert.py) but the
docstring still claims 13; update the docstring to state that the default
minimum opset for fp16 is 19 (and bf16 is 22) and keep the note that Q/DQ nodes
may require increasing the opset (e.g., FP8/INT4/NVFP4) so the documentation
matches the logic in get_qdq_precisions, get_min_opset_for_precisions and the
base_min_opset assignment.

In `@modelopt/onnx/quantization/__main__.py`:
- Around line 289-297: Update the help text for the "--opset" argument added via
argparser.add_argument to include bf16 minimum opset info: mention that
--high_precision_dtype bf16 may require opset 22 (in addition to the existing
note about 19 for fp16, 21 for int4, 23 for nvfp4), and make the same change to
the other duplicate "--opset" help string found elsewhere in this module; ensure
the message clearly states that opset may be automatically increased if required
by operations.

In `@modelopt/onnx/quantization/quantize.py`:
- Around line 124-130: The min-opset lookup uses exact keys so variants like
"int4_awq" fall back to BASE_MIN_OPSET; normalize quantize_mode before querying
QDQ_PRECISION_MIN_OPSET: compute a normalized_mode (e.g., if "int4" in
quantize_mode -> "int4", if "nvfp4" in quantize_mode or "float4" variant ->
"float4_e2m1fn", etc.), then use QDQ_PRECISION_MIN_OPSET.get(normalized_mode,
BASE_MIN_OPSET) when setting mode_min_opset; update references in quantize.py
that use quantize_mode (including the substring checks and the get_opset_version
flow) to use the normalized value so variants resolve to the correct minimum
opset.

In `@modelopt/onnx/utils.py`:
- Around line 702-731: get_qdq_precisions currently only inspects
DequantizeLinear nodes with Constant inputs and misses QuantizeLinear nodes and
non-constant/Variable paths (activations), causing under-reporting of
precisions; update get_qdq_precisions to also iterate QuantizeLinear nodes and
extract precision from their output_dtype attribute where present, and for both
QuantizeLinear and DequantizeLinear handle Variable inputs by resolving the
tensor type via the graph/model value_info or node.output type (e.g., check
graph.value_info / model.graph.value_info / model.graph.input/output types for
the corresponding tensor and use its elem_type/name), while still keeping the
existing Constant-path logic (Constant.values.dtype.name) and preserving
detection of custom nodes like TRT_FP4DynamicQuantize.

🧹 Nitpick comments (1)

tests/unit/onnx/test_quantize_api.py (1)
29-34: Avoid hard‑coded min‑opset duplication in tests.

To reduce drift, consider deriving these values from the production constants.
♻️ Suggested refactor
-from modelopt.onnx.utils import get_opset_version
+from modelopt.onnx.utils import BASE_MIN_OPSET, QDQ_PRECISION_MIN_OPSET, get_opset_version
@@
 MIN_OPSET = {
-    "int8": 19,
-    "fp8": 19,
-    "int4": 21,
+    "int8": BASE_MIN_OPSET,
+    "fp8": BASE_MIN_OPSET,
+    "int4": QDQ_PRECISION_MIN_OPSET["int4"],
 }

modelopt/onnx/autocast/convert.py

coderabbitai · 2026-01-22T19:25:09Z

modelopt/onnx/quantization/__main__.py

+    argparser.add_argument(
+        "--opset",
+        type=int,
+        help=(
+            "Target ONNX opset version for the quantized model. If not specified, uses default minimum opset "
+            "(19 for fp16 scales support, 21 for int4, 23 for nvfp4). The opset may be automatically increased "
+            "if certain operations require a higher version."
+        ),
+    )


⚠️ Potential issue | 🟡 Minor

Clarify BF16 minimum opset in --opset help text.
--high_precision_dtype bf16 can require opset 22; the help message lists fp16/int4/nvfp4 only. Adding bf16 avoids confusion.

✏️ Suggested doc tweak

- "Target ONNX opset version for the quantized model. If not specified, uses default minimum opset " - "(19 for fp16 scales support, 21 for int4, 23 for nvfp4). The opset may be automatically increased " + "Target ONNX opset version for the quantized model. If not specified, uses default minimum opset " + "(19 for fp16 scales support, 22 for bf16, 21 for int4, 23 for nvfp4). The opset may be automatically increased " "if certain operations require a higher version."

Also applies to: 364-364

🤖 Prompt for AI Agents

In `@modelopt/onnx/quantization/__main__.py` around lines 289 - 297, Update the help text for the "--opset" argument added via argparser.add_argument to include bf16 minimum opset info: mention that --high_precision_dtype bf16 may require opset 22 (in addition to the existing note about 19 for fp16, 21 for int4, 23 for nvfp4), and make the same change to the other duplicate "--opset" help string found elsewhere in this module; ensure the message clearly states that opset may be automatically increased if required by operations.

coderabbitai · 2026-01-22T19:25:09Z

modelopt/onnx/quantization/quantize.py

+    # Opset 19 is the minimum required for fp16 scales in Q/DQ nodes
+    # Higher opsets required for specific quantization modes (int4: 21, nvfp4: 23)
+    original_opset_version = get_opset_version(onnx_model)

-    required_opset_version = 13
-    if opset_version < required_opset_version and opset_version != 1:
-        opset_version = required_opset_version
-        onnx_model = onnx.version_converter.convert_version(onnx_model, opset_version)
-        onnx_path = os.path.join(output_dir, f"{model_name}_opset{opset_version}.onnx")
+    # Determine minimum required opset based on quantization mode
+    mode_min_opset = QDQ_PRECISION_MIN_OPSET.get(quantize_mode, BASE_MIN_OPSET)
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l modelopt/onnx/quantization/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 108

🏁 Script executed:

cat -n modelopt/onnx/quantization/quantize.py | sed -n '110,150p'

Repository: NVIDIA/Model-Optimizer

Length of output: 2526

🏁 Script executed:

rg -n "QDQ_PRECISION_MIN_OPSET|BASE_MIN_OPSET" modelopt/onnx/quantization/ -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 719

🏁 Script executed:

rg -n '"int4"|"nvfp4"|"fp4"' modelopt/onnx/quantization/ -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 798

🏁 Script executed:

rg -n 'quantize_mode.*"int4"|"int4".*in.*quantize_mode' modelopt/onnx/ -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 415

🏁 Script executed:

rg -n "QDQ_PRECISION_MIN_OPSET\s*=" modelopt/onnx/ -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 657

🏁 Script executed:

rg -n "int4_awq|int4_gptq|int4_woq|nvfp4" modelopt/onnx/

Repository: NVIDIA/Model-Optimizer

Length of output: 1301

🏁 Script executed:

rg -n "quantize_mode" modelopt/onnx/quantization/__init__.py modelopt/onnx/quantization/__main__.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 1216

🏁 Script executed:

rg -n "def quantize\(" modelopt/onnx/quantization/quantize.py -A 15

Repository: NVIDIA/Model-Optimizer

Length of output: 816

🏁 Script executed:

rg -n "nvfp4_exporter|nvfp4" modelopt/onnx/ -B 3 -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 4595

🏁 Script executed:

rg -n "int4_awq" modelopt/onnx/quantization/quantize.py -B 3 -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

cat -n modelopt/onnx/quantization/quantize.py | sed -n '500,560p'

Repository: NVIDIA/Model-Optimizer

Length of output: 3232

Normalize quantize_mode before min‑opset lookup.

The code at line 523 uses substring matching (elif "int4" in quantize_mode), implying variant strings are supported. However, the min‑opset lookup at line 129 uses exact key matching, so variant values like int4_awq will incorrectly fall back to BASE_MIN_OPSET (19) instead of the required 21. Additionally, the documentation references opset 23 for nvfp4, but that key doesn't exist in QDQ_PRECISION_MIN_OPSET; only float4_e2m1fn is present. This creates a mismatch where int4 variants could proceed with insufficient opset. Normalize quantize_mode before lookup to handle variants consistently.

🔧 Suggested fix

- mode_min_opset = QDQ_PRECISION_MIN_OPSET.get(quantize_mode, BASE_MIN_OPSET) + if "int4" in quantize_mode: + mode_min_opset = QDQ_PRECISION_MIN_OPSET["int4"] + elif quantize_mode in QDQ_PRECISION_MIN_OPSET: + mode_min_opset = QDQ_PRECISION_MIN_OPSET[quantize_mode] + else: + mode_min_opset = BASE_MIN_OPSET

🤖 Prompt for AI Agents

In `@modelopt/onnx/quantization/quantize.py` around lines 124 - 130, The min-opset lookup uses exact keys so variants like "int4_awq" fall back to BASE_MIN_OPSET; normalize quantize_mode before querying QDQ_PRECISION_MIN_OPSET: compute a normalized_mode (e.g., if "int4" in quantize_mode -> "int4", if "nvfp4" in quantize_mode or "float4" variant -> "float4_e2m1fn", etc.), then use QDQ_PRECISION_MIN_OPSET.get(normalized_mode, BASE_MIN_OPSET) when setting mode_min_opset; update references in quantize.py that use quantize_mode (including the substring checks and the get_opset_version flow) to use the normalized value so variants resolve to the correct minimum opset.

coderabbitai · 2026-01-22T19:25:09Z

modelopt/onnx/utils.py

+def get_qdq_precisions(model: onnx.ModelProto) -> set:
+    """Gets the Q/DQ precision types present in the model.
+
+    Args:
+        model: Loaded in-memory onnx ModelProto.
+
+    Returns:
+        set: Set of Q/DQ precision types present in the model (e.g., 'float8_e4m3fn', 'int8',
+             'int4', 'float4_e2m1fn').
+    """
+    graph = gs.import_onnx(model)
+    precisions = set()
+
+    # Check for custom 'NVFP4' nodes
+    custom_fp4_q_nodes = [node for node in graph.nodes if node.op == "TRT_FP4DynamicQuantize"]
+    if custom_fp4_q_nodes:
+        precisions.add("float4_e2m1fn")
+
+    # Check for precision in DQ nodes
+    dq_nodes = [node for node in graph.nodes if node.op == "DequantizeLinear"]
+    for dq_node in dq_nodes:
+        if len(dq_node.inputs) >= 3 and dq_node.inputs[2] is not None:
+            # If zero-point is set, return that as the quantization mode
+            if isinstance(dq_node.inputs[2], Constant) and dq_node.inputs[2].values is not None:
+                precisions.add(dq_node.inputs[2].values.dtype.name)
+        elif isinstance(dq_node.inputs[0], Constant) and dq_node.inputs[0].values is not None:
+            # Else, return the node's input precision (ex: 'NVFP4' weight quantization)
+            precisions.add(dq_node.inputs[0].values.dtype.name)
+
+    return precisions


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Locate call sites and current precision inference usage. rg -n "get_qdq_precisions" -S rg -n "QuantizeLinear" modelopt/onnx -S

Repository: NVIDIA/Model-Optimizer

Length of output: 5270

🏁 Script executed:

#!/bin/bash # Check how get_qdq_precisions result is used sed -n '220,250p' modelopt/onnx/autocast/convert.py # Check QuantizeLinear output_dtype handling sed -n '915,935p' modelopt/onnx/utils.py # Search for function that uses precisions from get_qdq_precisions rg -n "get_min_opset_for_precisions" -A 10 modelopt/onnx

Repository: NVIDIA/Model-Optimizer

Length of output: 5310

🏁 Script executed:

#!/bin/bash # Read the complete get_qdq_precisions function sed -n '702,745p' modelopt/onnx/utils.py # Check if there are any tests for this function find . -name "*.py" -type f | xargs grep -l "test.*get_qdq_precisions\|get_qdq_precisions.*test" 2>/dev/null | head -5 # Search for QuantizeLinear node processing with output_dtype rg -n "output_dtype" modelopt/onnx -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 4593

🏁 Script executed:

#!/bin/bash # Check if there are tests showing Q/DQ precision detection find . -name "*test*.py" -type f | xargs grep -l "get_qdq_precisions\|QuantizeLinear.*output_dtype" 2>/dev/null # Search for any models or test data with QuantizeLinear nodes find . -name "*.onnx" -o -name "*quantize*test*" 2>/dev/null | head -10 # Check if Variable nodes are used in Q/DQ paths rg -n "Variable.*QuantizeLinear|QuantizeLinear.*Variable" modelopt/onnx -B 2 -A 2 # Look for examples of Q/DQ with output_dtype attribute rg -n "output_dtype.*int4|output_dtype.*float8|int4.*output_dtype|float8.*output_dtype" modelopt/onnx -B 3 -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

#!/bin/bash # Search more carefully for QuantizeLinear node handling rg -n "for.*node.*graph" modelopt/onnx/utils.py -A 5 | grep -A 5 "QuantizeLinear" # Check what types of inputs QuantizeLinear nodes typically have rg -n "node.op.*QuantizeLinear\|node.op_type.*QuantizeLinear" modelopt/onnx -B 3 -A 3 | head -40 # Look at the quantization module to understand Q/DQ structure cat modelopt/onnx/quantization/fp8.py | head -150

Repository: NVIDIA/Model-Optimizer

Length of output: 6264

🏁 Script executed:

#!/bin/bash # Verify the actual impact by checking if output_dtype is ever set on QuantizeLinear rg -n "TRT_INT4QuantizeLinear\|output_dtype.*=\|output_dtype.*:" modelopt/onnx/quantization -B 2 -A 2 | head -50 # Check if there are any activation quantization examples rg -n "activation.*quantize\|activation.*Q/DQ" modelopt/onnx -i | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 48

get_qdq_precisions misses QuantizeLinear nodes and non-constant Q/DQ paths.

The function only checks DequantizeLinear nodes and only when inputs/zero-points are Constant types. It never processes QuantizeLinear nodes or handles Variable-typed inputs, which are typical for activations. This causes activation Q/DQ and weight QuantizeLinear with non-constant parameters to be under-reported. Since this set is used for opset version selection (e.g., int4 requires opset 21, float4_e2m1fn requires opset 23), insufficient precision detection could result in incompatible opset versions being selected. Add checks for QuantizeLinear.output_dtype attribute and consider inspecting Variables or value_info types to cover non-constant inputs.

🤖 Prompt for AI Agents

In `@modelopt/onnx/utils.py` around lines 702 - 731, get_qdq_precisions currently only inspects DequantizeLinear nodes with Constant inputs and misses QuantizeLinear nodes and non-constant/Variable paths (activations), causing under-reporting of precisions; update get_qdq_precisions to also iterate QuantizeLinear nodes and extract precision from their output_dtype attribute where present, and for both QuantizeLinear and DequantizeLinear handle Variable inputs by resolving the tensor type via the graph/model value_info or node.output type (e.g., check graph.value_info / model.graph.value_info / model.graph.input/output types for the corresponding tensor and use its elem_type/name), while still keeping the existing Constant-path logic (Constant.values.dtype.name) and preserving detection of custom nodes like TRT_FP4DynamicQuantize.

codecov · 2026-01-22T19:31:37Z

Codecov Report

❌ Patch coverage is 90.00000% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.86%. Comparing base (945ee02) to head (c9a55a1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/utils.py	86.36%	3 Missing ⚠️
modelopt/onnx/autocast/convert.py	84.61%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #809      +/-   ##
==========================================
+ Coverage   74.18%   74.86%   +0.68%     
==========================================
  Files         192      192              
  Lines       19236    19279      +43     
==========================================
+ Hits        14271    14434     +163     
+ Misses       4965     4845     -120

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2026-01-22T19:56:40Z

setup.py

-        "onnxruntime~=1.22.0 ; platform_machine == 'aarch64' or platform_system == 'Darwin'",
-        "onnxruntime-gpu~=1.22.0 ; platform_machine != 'aarch64' and platform_system != 'Darwin'",
+        "onnxruntime~=1.23.0 ; platform_machine == 'aarch64' or platform_system == 'Darwin'",
+        "onnxruntime-gpu~=1.23.0 ; platform_machine != 'aarch64' and platform_system != 'Darwin'",


Let's leave windows version unchanged as they saw some regressions. cc @hthadicherla

Suggested change

"onnxruntime-gpu~=1.23.0 ; platform_machine != 'aarch64' and platform_system != 'Darwin'",

"onnxruntime-gpu~=1.23.0 ; platform_machine != 'aarch64' and platform_system != 'Darwin' and platform_system != 'Windows'", # noqa: E501

"onnxruntime-gpu==1.22.0; platform_system == 'Windows'",

So the regression was initially observed by @ajrasane . He saw that quantizing with latest ort was causing accuracy degradations in some vision models in Linux . When I tested these models later , I found the exact same regressions in windows.

Better to leave it 1.22 in setup.py. In LLM quantization examples, we reinstall the latest ort version, by having ort==1.23 in requirements.txt.

galagam requested review from a team as code owners January 22, 2026 19:12

galagam requested review from ajrasane and kevalmorabia97 January 22, 2026 19:12

lint

c9a55a1

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

kevalmorabia97 reviewed Jan 22, 2026

View reviewed changes

kevalmorabia97 requested a review from hthadicherla January 22, 2026 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[5525939] Allow user to select target opset in MOQ; upgrade onnxruntime #809

[5525939] Allow user to select target opset in MOQ; upgrade onnxruntime #809

Uh oh!

galagam commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Jan 22, 2026

Uh oh!

coderabbitai bot Jan 22, 2026

Uh oh!

coderabbitai bot Jan 22, 2026

Uh oh!

codecov bot commented Jan 22, 2026

Uh oh!

kevalmorabia97 Jan 22, 2026

Uh oh!

hthadicherla Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	"onnxruntime-gpu~=1.23.0 ; platform_machine != 'aarch64' and platform_system != 'Darwin'",
	"onnxruntime-gpu~=1.23.0 ; platform_machine != 'aarch64' and platform_system != 'Darwin' and platform_system != 'Windows'", # noqa: E501
	"onnxruntime-gpu==1.22.0; platform_system == 'Windows'",

[5525939] Allow user to select target opset in MOQ; upgrade onnxruntime #809

Are you sure you want to change the base?

[5525939] Allow user to select target opset in MOQ; upgrade onnxruntime #809

Uh oh!

Conversation

galagam commented Jan 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 22, 2026

Codecov Report

Uh oh!

kevalmorabia97 Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

hthadicherla Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

galagam commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2026 •

edited

Loading