Skip to content

[Feature] FaceDetection plugin should support real inference FPS and input resolution control #159

Description

@guibubing

Is your feature request related to a problem? Please describe.

The current FaceDetection plugin appears to run face detector inference on every video frame, while detectionInterval only throttles state handling / callbacks. This makes it difficult to use FaceDetection in low-end devices or TRTC applications that need very low overhead face-presence detection.

From the published trtc-sdk-v5@5.18.2 package:

  • plugins/video-effect/face-detection/face-detection.esm.js: detectFace() calls getCurrentStatus() before checking _detectionInterval.
  • getCurrentStatus() calls this._visionTaskRegistry.getResult(this._faceDetectorHash).detections and then resetHashResults().
  • assets/mediapipe/vision.js: FaceDetector is mapped to instance.detectForVideo.bind(instance).
  • VisionTaskRegistry.reasoning() calls the reasoning function with video and performance.now().
  • detectFace() schedules itself again with video.requestVideoFrameCallback(this.detectFace.bind(this)).

So with a 30 FPS camera stream, detectForVideo may run close to 30 times per second even if detectionInterval is set to a much larger value.

Describe the solution you'd like

Please add options to the FaceDetection plugin to control actual inference workload:

  1. inferenceFps or fps: limit how often detectForVideo is called, for example 1-5 FPS for lightweight face-presence checks.
  2. inputResolution: allow downsampling before inference, for example { width: 96, height: 72 }, { width: 160, height: 120 }, or similar.
  3. Keep detectionInterval for callback/state debounce, or clarify its documentation if it is not intended to throttle model inference.

A possible implementation would be:

  • Move the time check before getCurrentStatus(), so expensive inference is skipped until the next inference window.
  • Optionally use a canvas / OffscreenCanvas as the FaceDetector input when inputResolution is provided.
  • Preserve the current behavior as the default to avoid breaking existing users.

Describe alternatives you've considered

Applications can implement a custom MediaPipe FaceDetector pipeline with low-resolution sampling and low FPS, but this duplicates functionality already provided by the TRTC plugin and requires separate handling of camera/video resources.

Additional context

This is useful for applications that already use TRTC and only need a simple hasFace: boolean signal, especially when running alongside audio/video publishing on low-end clients. In such cases, 1-2 FPS and a small input resolution are often enough and can significantly reduce CPU usage.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions