Generated SRT contains only one subtitle cue with invalid timestamp `00:00:00,000 --> 00:00:00,000`.

## 🐛 Bug

Generated SRT contains only one subtitle cue with invalid timestamp `00:00:00,000 --> 00:00:00,000`.

I used the subtitle generation script on an MP4 video, but the output `.srt` file contains only a single cue. The cue includes the full transcribed text, but both start and end timestamps are `00:00:00,000`.

This makes the SRT unusable for subtitle display or downstream TTS/dubbing workflows, because there is no timing information for individual spoken segments.

## To Reproduce

Steps to reproduce the behavior:

1. Run the subtitle generation command:

```bash
python generate_subtitle.py 7646260767334944046.mp4 -o ./sub.srt --lang zh
```
(Video link at: https://www.douyin.com/video/7646260767334944046)

2. Open the generated `sub.srt`.

3. The output contains only one cue, similar to:

```srt
1
00:00:00,000 --> 00:00:00,000
<full transcript text here>
```

## Code sample

No custom code was used. I reproduced the issue using the provided subtitle generation script directly.

```python
# No custom code
```

## Expected behavior

The generated SRT should contain multiple subtitle cues with valid start and end timestamps, for example:

```srt
1
00:00:00,000 --> 00:00:03,500
First spoken sentence...

2
00:00:03,500 --> 00:00:07,200
Next spoken sentence...
```

The subtitle segments should be split according to the actual speech timing in the video/audio.

## Error logs

No Python traceback or runtime error was shown.

The issue is in the generated SRT output:

```text
Only one subtitle cue is generated.
The timestamp is always:
00:00:00,000 --> 00:00:00,000
```

## Environment

* OS: macOS
* Device: MacBook M5
* Python version: Not confirmed
* FunASR version: Not confirmed
* ModelScope version: Not confirmed
* PyTorch / torchaudio version: Not confirmed
* Install method (`pip`, source, Docker): Not confirmed
* Device (`cuda`, `cpu`, `mps`): Not confirmed
* GPU model: Apple Silicon / integrated GPU
* CUDA/cuDNN version: N/A
* Docker image tag, if used: N/A

## Audio details

* Duration: Not confirmed
* Sample rate: Not confirmed
* Format: MP4 video
* Language/dialect: Chinese (`--lang zh`)
* Speaker count: Not confirmed
* Background noise/music: Not confirmed

## Additional context

This issue is important for subtitle-based TTS/dubbing workflows. When the full transcript is placed into a single cue with timestamp `00:00:00,000 --> 00:00:00,000`, downstream voice generation tools cannot preserve pauses, speech timing, or sentence-level alignment.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generated SRT contains only one subtitle cue with invalid timestamp `00:00:00,000 --> 00:00:00,000`. #3059

🐛 Bug

To Reproduce

Code sample

Expected behavior

Error logs

Environment

Audio details

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Generated SRT contains only one subtitle cue with invalid timestamp 00:00:00,000 --> 00:00:00,000. #3059

Description

🐛 Bug

To Reproduce

Code sample

Expected behavior

Error logs

Environment

Audio details

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Generated SRT contains only one subtitle cue with invalid timestamp `00:00:00,000 --> 00:00:00,000`. #3059