Skip to content

Distinguish Important from Junk Mail #101

Description

@yushanwe

Feature Description
System to categorize mail as important or junk based on photos

Problem It Solves
Cannot easily distinguish important from junk mail

Proposed Solution
Mail categorization system

Implementation details

Assume each generated tool implements one user-facing task. If this issue enumerates multiple stages, execute one ordered copilot_llm_call(...) per stage and explicitly pass useful structured artifacts to later calls with metadata={"previous_stage_artifact": ...}. Use the stage capability as capability. Choose only from these capabilities: general_reasoning, ocr, object_detection_localization, structured_visual_understanding, spatial_reasoning, navigation, camera_motion, or temporal_reasoning. Never use visual_reasoning. The backend may evaluate and escalate reasoning capabilities according to the execution policy. Generated tools must not choose implementations, models, providers, detector backends, fallback order, retries, or verification logic. Do not implement detection, OCR, VLM, LLM, model loading, or provider calls inside generated tool files. Generated tools must not create routers, capability registries, detector/OCR/LLM wrappers, new model-router clients, provider-specific DEFAULT_MODEL constants, COCO_CLASSES, .pt model loading/discovery logic, or direct provider calls.

Alternatives Considered

Example usage
Take photos of mail and tell user which pieces are likely important or junk

Live Mode
no

Live Query

Additional Context
Custom GPT: No
Unless otherwise specified, in streaming mode, any verbal/text response should be limited to 15 words. No such limit applies to one-shot output.

Task Stages

Stage 1

  • Goal: Detect objects.
  • Capability: object_detection_localization

Stage 2

  • Goal: Read text.
  • Capability: ocr

Stage 3

  • Goal: Make high-level semantic understanding of the detected objects.
  • Capability: general_reasoning

Write the code for this tool inside the tools folder. Assume the tool implements one user-facing task. For a Task Stages section, make one ordered copilot_llm_call(...) per stage and explicitly pass useful artifacts to later calls. Use only these capabilities: general_reasoning, ocr, object_detection_localization, structured_visual_understanding, spatial_reasoning, navigation, camera_motion, temporal_reasoning. Never use visual_reasoning. Do not import litellm, call litellm.completion(), create new model-router clients, create ModelRouter classes, resolve model names, resolve API keys, import detector libraries/provider SDKs/YOLO, define COCO_CLASSES, hardcode model names, load .pt files, or call YOLO(...). Do not select implementations, providers, fallback order, retries, or verification logic in generated tools.

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions