Is your feature request related to a problem? Please describe.
The current FaceDetection plugin appears to run face detector inference on every video frame, while detectionInterval only throttles state handling / callbacks. This makes it difficult to use FaceDetection in low-end devices or TRTC applications that need very low overhead face-presence detection.
From the published trtc-sdk-v5@5.18.2 package:
plugins/video-effect/face-detection/face-detection.esm.js: detectFace() calls getCurrentStatus() before checking _detectionInterval.
getCurrentStatus() calls this._visionTaskRegistry.getResult(this._faceDetectorHash).detections and then resetHashResults().
assets/mediapipe/vision.js: FaceDetector is mapped to instance.detectForVideo.bind(instance).
VisionTaskRegistry.reasoning() calls the reasoning function with video and performance.now().
detectFace() schedules itself again with video.requestVideoFrameCallback(this.detectFace.bind(this)).
So with a 30 FPS camera stream, detectForVideo may run close to 30 times per second even if detectionInterval is set to a much larger value.
Describe the solution you'd like
Please add options to the FaceDetection plugin to control actual inference workload:
inferenceFps or fps: limit how often detectForVideo is called, for example 1-5 FPS for lightweight face-presence checks.
inputResolution: allow downsampling before inference, for example { width: 96, height: 72 }, { width: 160, height: 120 }, or similar.
- Keep
detectionInterval for callback/state debounce, or clarify its documentation if it is not intended to throttle model inference.
A possible implementation would be:
- Move the time check before
getCurrentStatus(), so expensive inference is skipped until the next inference window.
- Optionally use a canvas / OffscreenCanvas as the FaceDetector input when
inputResolution is provided.
- Preserve the current behavior as the default to avoid breaking existing users.
Describe alternatives you've considered
Applications can implement a custom MediaPipe FaceDetector pipeline with low-resolution sampling and low FPS, but this duplicates functionality already provided by the TRTC plugin and requires separate handling of camera/video resources.
Additional context
This is useful for applications that already use TRTC and only need a simple hasFace: boolean signal, especially when running alongside audio/video publishing on low-end clients. In such cases, 1-2 FPS and a small input resolution are often enough and can significantly reduce CPU usage.
Is your feature request related to a problem? Please describe.
The current FaceDetection plugin appears to run face detector inference on every video frame, while
detectionIntervalonly throttles state handling / callbacks. This makes it difficult to use FaceDetection in low-end devices or TRTC applications that need very low overhead face-presence detection.From the published
trtc-sdk-v5@5.18.2package:plugins/video-effect/face-detection/face-detection.esm.js:detectFace()callsgetCurrentStatus()before checking_detectionInterval.getCurrentStatus()callsthis._visionTaskRegistry.getResult(this._faceDetectorHash).detectionsand thenresetHashResults().assets/mediapipe/vision.js:FaceDetectoris mapped toinstance.detectForVideo.bind(instance).VisionTaskRegistry.reasoning()calls the reasoning function withvideoandperformance.now().detectFace()schedules itself again withvideo.requestVideoFrameCallback(this.detectFace.bind(this)).So with a 30 FPS camera stream,
detectForVideomay run close to 30 times per second even ifdetectionIntervalis set to a much larger value.Describe the solution you'd like
Please add options to the FaceDetection plugin to control actual inference workload:
inferenceFpsorfps: limit how oftendetectForVideois called, for example 1-5 FPS for lightweight face-presence checks.inputResolution: allow downsampling before inference, for example{ width: 96, height: 72 },{ width: 160, height: 120 }, or similar.detectionIntervalfor callback/state debounce, or clarify its documentation if it is not intended to throttle model inference.A possible implementation would be:
getCurrentStatus(), so expensive inference is skipped until the next inference window.inputResolutionis provided.Describe alternatives you've considered
Applications can implement a custom MediaPipe FaceDetector pipeline with low-resolution sampling and low FPS, but this duplicates functionality already provided by the TRTC plugin and requires separate handling of camera/video resources.
Additional context
This is useful for applications that already use TRTC and only need a simple
hasFace: booleansignal, especially when running alongside audio/video publishing on low-end clients. In such cases, 1-2 FPS and a small input resolution are often enough and can significantly reduce CPU usage.