🐛 Bug
Generated SRT contains only one subtitle cue with invalid timestamp 00:00:00,000 --> 00:00:00,000.
I used the subtitle generation script on an MP4 video, but the output .srt file contains only a single cue. The cue includes the full transcribed text, but both start and end timestamps are 00:00:00,000.
This makes the SRT unusable for subtitle display or downstream TTS/dubbing workflows, because there is no timing information for individual spoken segments.
To Reproduce
Steps to reproduce the behavior:
- Run the subtitle generation command:
python generate_subtitle.py 7646260767334944046.mp4 -o ./sub.srt --lang zh
(Video link at: https://www.douyin.com/video/7646260767334944046)
-
Open the generated sub.srt.
-
The output contains only one cue, similar to:
1
00:00:00,000 --> 00:00:00,000
<full transcript text here>
Code sample
No custom code was used. I reproduced the issue using the provided subtitle generation script directly.
Expected behavior
The generated SRT should contain multiple subtitle cues with valid start and end timestamps, for example:
1
00:00:00,000 --> 00:00:03,500
First spoken sentence...
2
00:00:03,500 --> 00:00:07,200
Next spoken sentence...
The subtitle segments should be split according to the actual speech timing in the video/audio.
Error logs
No Python traceback or runtime error was shown.
The issue is in the generated SRT output:
Only one subtitle cue is generated.
The timestamp is always:
00:00:00,000 --> 00:00:00,000
Environment
- OS: macOS
- Device: MacBook M5
- Python version: Not confirmed
- FunASR version: Not confirmed
- ModelScope version: Not confirmed
- PyTorch / torchaudio version: Not confirmed
- Install method (
pip, source, Docker): Not confirmed
- Device (
cuda, cpu, mps): Not confirmed
- GPU model: Apple Silicon / integrated GPU
- CUDA/cuDNN version: N/A
- Docker image tag, if used: N/A
Audio details
- Duration: Not confirmed
- Sample rate: Not confirmed
- Format: MP4 video
- Language/dialect: Chinese (
--lang zh)
- Speaker count: Not confirmed
- Background noise/music: Not confirmed
Additional context
This issue is important for subtitle-based TTS/dubbing workflows. When the full transcript is placed into a single cue with timestamp 00:00:00,000 --> 00:00:00,000, downstream voice generation tools cannot preserve pauses, speech timing, or sentence-level alignment.
🐛 Bug
Generated SRT contains only one subtitle cue with invalid timestamp
00:00:00,000 --> 00:00:00,000.I used the subtitle generation script on an MP4 video, but the output
.srtfile contains only a single cue. The cue includes the full transcribed text, but both start and end timestamps are00:00:00,000.This makes the SRT unusable for subtitle display or downstream TTS/dubbing workflows, because there is no timing information for individual spoken segments.
To Reproduce
Steps to reproduce the behavior:
(Video link at: https://www.douyin.com/video/7646260767334944046)
Open the generated
sub.srt.The output contains only one cue, similar to:
Code sample
No custom code was used. I reproduced the issue using the provided subtitle generation script directly.
# No custom codeExpected behavior
The generated SRT should contain multiple subtitle cues with valid start and end timestamps, for example:
The subtitle segments should be split according to the actual speech timing in the video/audio.
Error logs
No Python traceback or runtime error was shown.
The issue is in the generated SRT output:
Environment
pip, source, Docker): Not confirmedcuda,cpu,mps): Not confirmedAudio details
--lang zh)Additional context
This issue is important for subtitle-based TTS/dubbing workflows. When the full transcript is placed into a single cue with timestamp
00:00:00,000 --> 00:00:00,000, downstream voice generation tools cannot preserve pauses, speech timing, or sentence-level alignment.