Skip to content

Commit 39d8917

Browse files
committed
docs(adr): add architecture decisions for v2.2 and v3.0
Add ADRs documenting key architectural decisions: - ADR-0009: Remove Data Component from UMS Rationale for removing DataComponent due to semantic ambiguity - ADR-0010: Foundation Component New component type for philosophical and architectural content - ADR-0011: Atomic Primitives and URI Scheme Enable granular RAG retrieval with ums:// addressing - ADR-0012: Layered Cake Assembler Prompt assembly strategy for optimal LLM consumption - ADR-0013: One Source, Two Runtimes Dual-format output (human-readable + machine-optimized) These ADRs inform the v2.2 and v3.0 specification development.
1 parent 2bde450 commit 39d8917

5 files changed

Lines changed: 1601 additions & 0 deletions
Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
# ADR-0009: Remove Data Component from UMS v3.0
2+
3+
**Status:** Accepted
4+
**Date:** 2025-11-25
5+
**Deciders:** Jason Knight
6+
**Related:** UMS v3.0 Specification, ADR-0004 (Machine-First Module Architecture)
7+
8+
---
9+
10+
## Context
11+
12+
In UMS v2.0 through v2.2, modules could include a `DataComponent` for storing reference information such as configuration data, schemas, or structured reference content. The component was designed to hold raw data in various formats (JSON, YAML, XML) with minimal contextual information:
13+
14+
```typescript
15+
interface DataComponent {
16+
type: "data";
17+
metadata?: ComponentMetadata;
18+
data: {
19+
format: string; // Media type (json, yaml, xml, etc.)
20+
description?: string; // What this data represents
21+
value: unknown; // The actual data
22+
};
23+
}
24+
```
25+
26+
**Example Usage in v2.2:**
27+
```typescript
28+
data: {
29+
id: 'config',
30+
format: 'json',
31+
description: 'Database configuration',
32+
value: {
33+
host: 'localhost',
34+
port: 5432,
35+
maxConnections: 20
36+
}
37+
}
38+
```
39+
40+
While the Data component appeared useful for reference data, practical experience and the introduction of UMS v3.0's architectural improvements revealed fundamental issues that made it incompatible with the system's design goals.
41+
42+
### Problems Identified
43+
44+
1. **No Clear Cognitive Level**
45+
- The Cognitive Hierarchy (levels 0-6) classifies content by abstraction: from universal axioms (level 0) to meta-cognition (level 6)
46+
- Raw data does not fit this hierarchy—it is neither instructional, conceptual, nor foundational
47+
- Data lacks the pedagogical or operational intent that defines all other components
48+
49+
2. **Vector Search Noise**
50+
- Raw JSON/YAML produces poor-quality embeddings in vector databases
51+
- Structured data without prose context generates semantically weak vectors
52+
- Search queries for concepts or instructions return irrelevant data matches
53+
- Degrades RAG retrieval precision across the entire module ecosystem
54+
55+
3. **No Clear Zone in Layered Cake Assembler**
56+
- UMS v3.0 introduces the Layered Cake assembler with 4 attention-optimized zones:
57+
- Zone 0 (Constitution): Policies + Principles
58+
- Zone 1 (Context): Patterns + Concepts + References
59+
- Zone 2 (Action): Procedures + Evaluations
60+
- Zone 3 (Steering): Demonstrations
61+
- Data component cannot be logically placed in any zone
62+
- Raw data is neither instructional (Zone 2), conceptual (Zone 1), nor exemplary (Zone 3)
63+
64+
4. **TypeScript Type Safety Hole**
65+
- The `value: unknown` type eliminates type checking
66+
- Defeats UMS v2.0+ goal of TypeScript-first type safety
67+
- Creates runtime errors that should be caught at compile time
68+
- Incompatible with the typed primitive system in v3.0
69+
70+
5. **Architectural Mismatch**
71+
- Data component was the only component without clear pedagogical purpose
72+
- Other components (Foundation, Instruction, Knowledge) teach principles, guide actions, or explain concepts
73+
- Data simply stores—it doesn't instruct, explain, or guide
74+
75+
---
76+
77+
## Decision
78+
79+
**Remove the `DataComponent` entirely from UMS v3.0.**
80+
81+
Modules may no longer include a `data` field or Data component. All reference information must be presented with appropriate context using existing Knowledge or Foundation components.
82+
83+
### Key Changes
84+
85+
1. **Removed from Module Schema**
86+
- `data?: DataComponent` removed from Module interface
87+
- `DataComponent` type removed from component union
88+
- No primitive mappings for Data component in v3.0
89+
90+
2. **Migration Path Provided**
91+
- Convert Data to `Knowledge.examples` (Demonstration primitive)
92+
- Or convert to `Knowledge.concepts` (Concept primitive)
93+
- Add contextual explanation and rationale
94+
95+
3. **Updated Component Structure**
96+
- Three core components remain: Foundation, Instruction, Knowledge
97+
- All components map cleanly to the 7 atomic primitives
98+
- All primitives have clear Layered Cake zones
99+
100+
---
101+
102+
## Consequences
103+
104+
### Positive
105+
106+
1. **Cleaner Cognitive Hierarchy**
107+
- All components now fit the 0-6 cognitive level classification
108+
- No special-case handling for "contentless" components
109+
110+
2. **Improved Vector Search Quality**
111+
- All embedded content includes prose context and explanation
112+
- Better semantic relevance in RAG retrieval
113+
- Reduced false positives in module discovery
114+
115+
3. **Type Safety Restored**
116+
- No `unknown` types in component structure
117+
- Full TypeScript inference across all primitives
118+
- Compile-time validation of module content
119+
120+
4. **Simplified Assembler Logic**
121+
- All primitives map cleanly to Layered Cake zones
122+
- No edge cases for "unzoned" content
123+
- Clearer mental model for module authors
124+
125+
5. **Pedagogical Consistency**
126+
- All components serve a teaching or guiding purpose
127+
- Data is always presented with "why" and "how to use"
128+
- Better learning experience for AI consumers
129+
130+
### Negative
131+
132+
1. **Migration Burden**
133+
- Existing modules with Data components must be rewritten
134+
- Requires manual review to add appropriate context
135+
- Cannot be fully automated (requires human judgment)
136+
137+
2. **Verbosity for Simple Reference Data**
138+
- Configuration examples now require explanation and rationale
139+
- More boilerplate than raw JSON in v2.2
140+
- May feel excessive for obvious data (e.g., port numbers)
141+
142+
3. **No Direct Data Storage**
143+
- Cannot store machine-readable config without prose wrapper
144+
- May frustrate users wanting pure key-value storage
145+
146+
---
147+
148+
## Migration Strategy
149+
150+
### For Modules Using Data Component
151+
152+
**Before (v2.2 with Data component):**
153+
```typescript
154+
data: {
155+
id: 'config',
156+
format: 'json',
157+
description: 'Database configuration',
158+
value: {
159+
host: 'localhost',
160+
port: 5432,
161+
maxConnections: 20
162+
}
163+
}
164+
```
165+
166+
**After (v3.0 using Knowledge.examples):**
167+
```typescript
168+
knowledge: {
169+
id: 'setup-examples',
170+
explanation: 'Configuration examples for database setup',
171+
examples: [
172+
{
173+
title: 'Production Database Configuration',
174+
rationale: 'Shows recommended production settings with SSL and connection pooling',
175+
language: 'json',
176+
snippet: `{
177+
"host": "db.production.example.com",
178+
"port": 5432,
179+
"maxConnections": 20,
180+
"ssl": true,
181+
"pooling": {
182+
"min": 2,
183+
"max": 10
184+
}
185+
}`
186+
}
187+
]
188+
}
189+
```
190+
191+
### Alternative: Use Concept for Schema Definitions
192+
193+
**For structural/schema data:**
194+
```typescript
195+
knowledge: {
196+
id: 'api-schema',
197+
explanation: 'Data structures for API communication',
198+
concepts: [
199+
{
200+
name: 'UserProfile',
201+
description: 'User account information structure',
202+
rationale: 'Standardized across all user-facing endpoints',
203+
example: `interface UserProfile {
204+
id: string;
205+
email: string;
206+
displayName: string;
207+
role: 'admin' | 'user' | 'guest';
208+
createdAt: Date;
209+
}`
210+
}
211+
]
212+
}
213+
```
214+
215+
### Decision Criteria
216+
217+
| Use Case | Recommended Approach | Rationale |
218+
|----------|---------------------|-----------|
219+
| Configuration examples | `Knowledge.examples` | Show concrete usage with context |
220+
| Schema definitions | `Knowledge.concepts` | Define structure conceptually |
221+
| Large reference data | External files + reference | Keep modules focused |
222+
| Key-value lookup | Wait for Reference RAG | Future dedicated system |
223+
224+
---
225+
226+
## Future Direction
227+
228+
### Reference RAG Strategy (Post-v3.0)
229+
230+
A dedicated "Reference RAG" system may be introduced as a **sidecar tool** for key-value lookup use cases:
231+
232+
- **Separate storage**: Reference data stored outside module files
233+
- **Optimized retrieval**: Direct key-value lookup without vector search
234+
- **No embedding noise**: Reference data excluded from semantic search
235+
- **Typed interfaces**: Full TypeScript support without `unknown` types
236+
237+
This would serve use cases like:
238+
- Large configuration dictionaries
239+
- API reference tables
240+
- Error code mappings
241+
- Translation tables
242+
243+
**Status:** Planned for post-v3.0, pending demand validation
244+
245+
---
246+
247+
## References
248+
249+
- UMS v3.0 Specification: `docs/spec/v3.0/unified_module_system_v3.0_spec.md`
250+
- Migration Guide: `docs/spec/v3.0/migration_from_v2.2.md`, Section 5.3
251+
- UMS v2.2 Data Component: `docs/spec/unified_module_system_v2_spec.md`, Section 2.2.3
252+
- Layered Cake Assembler: UMS v3.0 Specification, Section 4.2
253+
- ADR-0004: Machine-First Module Architecture
254+
255+
---
256+
257+
## Notes
258+
259+
This decision reflects lessons learned from practical usage of UMS v2.x systems. While the Data component appeared useful in theory, its implementation created more problems than it solved. By requiring all content to have clear pedagogical purpose and contextual framing, v3.0 maintains architectural consistency and improves AI consumption quality.
260+
261+
For users needing pure data storage, the recommended approach is to keep reference data in separate configuration files and reference them by name in Knowledge components. This separation of concerns aligns with best practices for both human and machine consumption.

0 commit comments

Comments
 (0)