|
| 1 | +# ADR-0009: Remove Data Component from UMS v3.0 |
| 2 | + |
| 3 | +**Status:** Accepted |
| 4 | +**Date:** 2025-11-25 |
| 5 | +**Deciders:** Jason Knight |
| 6 | +**Related:** UMS v3.0 Specification, ADR-0004 (Machine-First Module Architecture) |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Context |
| 11 | + |
| 12 | +In UMS v2.0 through v2.2, modules could include a `DataComponent` for storing reference information such as configuration data, schemas, or structured reference content. The component was designed to hold raw data in various formats (JSON, YAML, XML) with minimal contextual information: |
| 13 | + |
| 14 | +```typescript |
| 15 | +interface DataComponent { |
| 16 | + type: "data"; |
| 17 | + metadata?: ComponentMetadata; |
| 18 | + data: { |
| 19 | + format: string; // Media type (json, yaml, xml, etc.) |
| 20 | + description?: string; // What this data represents |
| 21 | + value: unknown; // The actual data |
| 22 | + }; |
| 23 | +} |
| 24 | +``` |
| 25 | + |
| 26 | +**Example Usage in v2.2:** |
| 27 | +```typescript |
| 28 | +data: { |
| 29 | + id: 'config', |
| 30 | + format: 'json', |
| 31 | + description: 'Database configuration', |
| 32 | + value: { |
| 33 | + host: 'localhost', |
| 34 | + port: 5432, |
| 35 | + maxConnections: 20 |
| 36 | + } |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +While the Data component appeared useful for reference data, practical experience and the introduction of UMS v3.0's architectural improvements revealed fundamental issues that made it incompatible with the system's design goals. |
| 41 | + |
| 42 | +### Problems Identified |
| 43 | + |
| 44 | +1. **No Clear Cognitive Level** |
| 45 | + - The Cognitive Hierarchy (levels 0-6) classifies content by abstraction: from universal axioms (level 0) to meta-cognition (level 6) |
| 46 | + - Raw data does not fit this hierarchy—it is neither instructional, conceptual, nor foundational |
| 47 | + - Data lacks the pedagogical or operational intent that defines all other components |
| 48 | + |
| 49 | +2. **Vector Search Noise** |
| 50 | + - Raw JSON/YAML produces poor-quality embeddings in vector databases |
| 51 | + - Structured data without prose context generates semantically weak vectors |
| 52 | + - Search queries for concepts or instructions return irrelevant data matches |
| 53 | + - Degrades RAG retrieval precision across the entire module ecosystem |
| 54 | + |
| 55 | +3. **No Clear Zone in Layered Cake Assembler** |
| 56 | + - UMS v3.0 introduces the Layered Cake assembler with 4 attention-optimized zones: |
| 57 | + - Zone 0 (Constitution): Policies + Principles |
| 58 | + - Zone 1 (Context): Patterns + Concepts + References |
| 59 | + - Zone 2 (Action): Procedures + Evaluations |
| 60 | + - Zone 3 (Steering): Demonstrations |
| 61 | + - Data component cannot be logically placed in any zone |
| 62 | + - Raw data is neither instructional (Zone 2), conceptual (Zone 1), nor exemplary (Zone 3) |
| 63 | + |
| 64 | +4. **TypeScript Type Safety Hole** |
| 65 | + - The `value: unknown` type eliminates type checking |
| 66 | + - Defeats UMS v2.0+ goal of TypeScript-first type safety |
| 67 | + - Creates runtime errors that should be caught at compile time |
| 68 | + - Incompatible with the typed primitive system in v3.0 |
| 69 | + |
| 70 | +5. **Architectural Mismatch** |
| 71 | + - Data component was the only component without clear pedagogical purpose |
| 72 | + - Other components (Foundation, Instruction, Knowledge) teach principles, guide actions, or explain concepts |
| 73 | + - Data simply stores—it doesn't instruct, explain, or guide |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +## Decision |
| 78 | + |
| 79 | +**Remove the `DataComponent` entirely from UMS v3.0.** |
| 80 | + |
| 81 | +Modules may no longer include a `data` field or Data component. All reference information must be presented with appropriate context using existing Knowledge or Foundation components. |
| 82 | + |
| 83 | +### Key Changes |
| 84 | + |
| 85 | +1. **Removed from Module Schema** |
| 86 | + - `data?: DataComponent` removed from Module interface |
| 87 | + - `DataComponent` type removed from component union |
| 88 | + - No primitive mappings for Data component in v3.0 |
| 89 | + |
| 90 | +2. **Migration Path Provided** |
| 91 | + - Convert Data to `Knowledge.examples` (Demonstration primitive) |
| 92 | + - Or convert to `Knowledge.concepts` (Concept primitive) |
| 93 | + - Add contextual explanation and rationale |
| 94 | + |
| 95 | +3. **Updated Component Structure** |
| 96 | + - Three core components remain: Foundation, Instruction, Knowledge |
| 97 | + - All components map cleanly to the 7 atomic primitives |
| 98 | + - All primitives have clear Layered Cake zones |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## Consequences |
| 103 | + |
| 104 | +### Positive |
| 105 | + |
| 106 | +1. **Cleaner Cognitive Hierarchy** |
| 107 | + - All components now fit the 0-6 cognitive level classification |
| 108 | + - No special-case handling for "contentless" components |
| 109 | + |
| 110 | +2. **Improved Vector Search Quality** |
| 111 | + - All embedded content includes prose context and explanation |
| 112 | + - Better semantic relevance in RAG retrieval |
| 113 | + - Reduced false positives in module discovery |
| 114 | + |
| 115 | +3. **Type Safety Restored** |
| 116 | + - No `unknown` types in component structure |
| 117 | + - Full TypeScript inference across all primitives |
| 118 | + - Compile-time validation of module content |
| 119 | + |
| 120 | +4. **Simplified Assembler Logic** |
| 121 | + - All primitives map cleanly to Layered Cake zones |
| 122 | + - No edge cases for "unzoned" content |
| 123 | + - Clearer mental model for module authors |
| 124 | + |
| 125 | +5. **Pedagogical Consistency** |
| 126 | + - All components serve a teaching or guiding purpose |
| 127 | + - Data is always presented with "why" and "how to use" |
| 128 | + - Better learning experience for AI consumers |
| 129 | + |
| 130 | +### Negative |
| 131 | + |
| 132 | +1. **Migration Burden** |
| 133 | + - Existing modules with Data components must be rewritten |
| 134 | + - Requires manual review to add appropriate context |
| 135 | + - Cannot be fully automated (requires human judgment) |
| 136 | + |
| 137 | +2. **Verbosity for Simple Reference Data** |
| 138 | + - Configuration examples now require explanation and rationale |
| 139 | + - More boilerplate than raw JSON in v2.2 |
| 140 | + - May feel excessive for obvious data (e.g., port numbers) |
| 141 | + |
| 142 | +3. **No Direct Data Storage** |
| 143 | + - Cannot store machine-readable config without prose wrapper |
| 144 | + - May frustrate users wanting pure key-value storage |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +## Migration Strategy |
| 149 | + |
| 150 | +### For Modules Using Data Component |
| 151 | + |
| 152 | +**Before (v2.2 with Data component):** |
| 153 | +```typescript |
| 154 | +data: { |
| 155 | + id: 'config', |
| 156 | + format: 'json', |
| 157 | + description: 'Database configuration', |
| 158 | + value: { |
| 159 | + host: 'localhost', |
| 160 | + port: 5432, |
| 161 | + maxConnections: 20 |
| 162 | + } |
| 163 | +} |
| 164 | +``` |
| 165 | + |
| 166 | +**After (v3.0 using Knowledge.examples):** |
| 167 | +```typescript |
| 168 | +knowledge: { |
| 169 | + id: 'setup-examples', |
| 170 | + explanation: 'Configuration examples for database setup', |
| 171 | + examples: [ |
| 172 | + { |
| 173 | + title: 'Production Database Configuration', |
| 174 | + rationale: 'Shows recommended production settings with SSL and connection pooling', |
| 175 | + language: 'json', |
| 176 | + snippet: `{ |
| 177 | + "host": "db.production.example.com", |
| 178 | + "port": 5432, |
| 179 | + "maxConnections": 20, |
| 180 | + "ssl": true, |
| 181 | + "pooling": { |
| 182 | + "min": 2, |
| 183 | + "max": 10 |
| 184 | + } |
| 185 | +}` |
| 186 | + } |
| 187 | + ] |
| 188 | +} |
| 189 | +``` |
| 190 | + |
| 191 | +### Alternative: Use Concept for Schema Definitions |
| 192 | + |
| 193 | +**For structural/schema data:** |
| 194 | +```typescript |
| 195 | +knowledge: { |
| 196 | + id: 'api-schema', |
| 197 | + explanation: 'Data structures for API communication', |
| 198 | + concepts: [ |
| 199 | + { |
| 200 | + name: 'UserProfile', |
| 201 | + description: 'User account information structure', |
| 202 | + rationale: 'Standardized across all user-facing endpoints', |
| 203 | + example: `interface UserProfile { |
| 204 | + id: string; |
| 205 | + email: string; |
| 206 | + displayName: string; |
| 207 | + role: 'admin' | 'user' | 'guest'; |
| 208 | + createdAt: Date; |
| 209 | +}` |
| 210 | + } |
| 211 | + ] |
| 212 | +} |
| 213 | +``` |
| 214 | + |
| 215 | +### Decision Criteria |
| 216 | + |
| 217 | +| Use Case | Recommended Approach | Rationale | |
| 218 | +|----------|---------------------|-----------| |
| 219 | +| Configuration examples | `Knowledge.examples` | Show concrete usage with context | |
| 220 | +| Schema definitions | `Knowledge.concepts` | Define structure conceptually | |
| 221 | +| Large reference data | External files + reference | Keep modules focused | |
| 222 | +| Key-value lookup | Wait for Reference RAG | Future dedicated system | |
| 223 | + |
| 224 | +--- |
| 225 | + |
| 226 | +## Future Direction |
| 227 | + |
| 228 | +### Reference RAG Strategy (Post-v3.0) |
| 229 | + |
| 230 | +A dedicated "Reference RAG" system may be introduced as a **sidecar tool** for key-value lookup use cases: |
| 231 | + |
| 232 | +- **Separate storage**: Reference data stored outside module files |
| 233 | +- **Optimized retrieval**: Direct key-value lookup without vector search |
| 234 | +- **No embedding noise**: Reference data excluded from semantic search |
| 235 | +- **Typed interfaces**: Full TypeScript support without `unknown` types |
| 236 | + |
| 237 | +This would serve use cases like: |
| 238 | +- Large configuration dictionaries |
| 239 | +- API reference tables |
| 240 | +- Error code mappings |
| 241 | +- Translation tables |
| 242 | + |
| 243 | +**Status:** Planned for post-v3.0, pending demand validation |
| 244 | + |
| 245 | +--- |
| 246 | + |
| 247 | +## References |
| 248 | + |
| 249 | +- UMS v3.0 Specification: `docs/spec/v3.0/unified_module_system_v3.0_spec.md` |
| 250 | +- Migration Guide: `docs/spec/v3.0/migration_from_v2.2.md`, Section 5.3 |
| 251 | +- UMS v2.2 Data Component: `docs/spec/unified_module_system_v2_spec.md`, Section 2.2.3 |
| 252 | +- Layered Cake Assembler: UMS v3.0 Specification, Section 4.2 |
| 253 | +- ADR-0004: Machine-First Module Architecture |
| 254 | + |
| 255 | +--- |
| 256 | + |
| 257 | +## Notes |
| 258 | + |
| 259 | +This decision reflects lessons learned from practical usage of UMS v2.x systems. While the Data component appeared useful in theory, its implementation created more problems than it solved. By requiring all content to have clear pedagogical purpose and contextual framing, v3.0 maintains architectural consistency and improves AI consumption quality. |
| 260 | + |
| 261 | +For users needing pure data storage, the recommended approach is to keep reference data in separate configuration files and reference them by name in Knowledge components. This separation of concerns aligns with best practices for both human and machine consumption. |
0 commit comments