⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Non-default 'decibel_thres' value causes FSMN VAD model crash #2780

@steven8274

Description

@steven8274

🐛 Bug

If the 'decibel_thres' of the 'FsmnVADStreaming' class is set to values higher the the default value(-50, for example), the model may crash at this line of 'funasr/models/fsmn_vad_streaming/model.py':

cur_decibel = cache["stats"].decibel[t]

the reported error is:

IndexError: list index out of range

This problem persists in versions 1.1.12, 1.1.14 and 1.3.0(maybe all versions from 1.1.12 to 1.3.0.)

To Reproduce

1.Set 'decibel_thres' of FsmnVADStreaming to -50.
2.Use the VAD pipline to segment audio.

Code sample

import soundfile as sf
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
import torch
import numpy as np

vad_inference_pipeline = pipeline(
    task=Tasks.voice_activity_detection,
    model='iic/speech_fsmn_vad_zh-cn-16k-common-pytorch',
    model_revision="v2.0.4",
    disable_update=True,
    device=device,
    decibel_thres=-50
)

wav, _ = sf.read("video_clip2_16k.wav", dtype="int16")
segments_result = vad_inference_pipeline(input=wav.tobytes(), fs=16000)

Expected behavior

Get the right segemnts of audio.

Environment

  • OS (Linux):
  • FunASR Version (1.1.12, 1.1.14, 1.3.0):
  • ModelScope Version (1.19.0):
  • PyTorch Version (2.0.1+cu118):
  • How you installed funasr (pip):
  • Python version: 3.8.20
  • GPU (4090)
  • CUDA/cuDNN version (cuda12.1):

video_clip2_16k.wav

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions