Update SKILL.md#30
Conversation
|
/evaluate |
🤖 Evaluation Results❌ Some evaluations failed — 0/1 assets passed
Scenario Detailsgenerate-microcks-openapi-samples (skill) — ❌ FAILModel: basic-openapi-sample
Tokens used: 120 input, 1,688 output |
Signed-off-by: Sebastien DEGODEZ <sebastien.degodez@gmail.com>
7cca5ab to
68fcb46
Compare
🤖 Evaluation Results❌ Some evaluations failed — 2/4 assets passed (4 skipped due to API errors — not counted)
Scenario Detailscsharp-clean-architecture-development (plugin) — ❌ FAILModel: scaffold-clean-architecture-solution
cqs-pattern-a-crud-no-bus
ddd-aggregate-invariant-domain-layer
interface-placement-repo-write-vs-read
authorization-placement-three-tiers
fetch-then-check-not-filter-in-query
iron-law-violation-detection
cqrs-bus-optional-conditions
domain-events-declare-vs-dispatch
shared-kernel-dependencies
minimal-context-tools (plugin) — ❌ FAILModel: find-files-by-pattern
superpowers-whetstone (plugin) —
|
| Run | Score | Passed |
|---|---|---|
| 1 | 10.00/10 | ✅ |
| 2 | 8.50/10 | ✅ |
configure-aot-json-serialization
AOT-compatible JSON serialization — skill must guide context and registration
| Run | Score | Passed |
|---|---|---|
| 1 | 10.00/10 | ✅ |
| 2 | 10.00/10 | ✅ |
setup-transport-in-program
Transport configuration — skill must guide SSE vs Stdio setup
| Run | Score | Passed |
|---|---|---|
| 1 | 9.50/10 | ✅ |
| 2 | 9.50/10 | ✅ |
generate-microcks-openapi-samples (skill) — ✅ PASS
Model: gpt-4o
Overall Score: 7.20/10
Pass Rate: 100%
basic-openapi-sample
Generate a Microcks-compatible OpenAPI sample for a simple REST API
| Run | Score | Passed |
|---|---|---|
| 1 | 7.20/10 | ✅ |
| 2 | 7.20/10 | ✅ |
migrating-prompts-to-skills (skill) — ⚠️ SKIPPED
⚠️ All runs failed due to API errors — evaluation was skipped and is not counted in the benchmark.
setup-husky-dotnet (skill) — ⚠️ SKIPPED
⚠️ All runs failed due to API errors — evaluation was skipped and is not counted in the benchmark.
skill-creator (skill) — ⚠️ SKIPPED
⚠️ All runs failed due to API errors — evaluation was skipped and is not counted in the benchmark.
Tokens used: 77,287 input, 11,020 output
View workflow run
Closes #
📑 Description
✅ Checks
ℹ Additional Information