DecapodLabs
diff --git a/‎.decapod/OVERRIDE.md‎
Lines changed: 2 additions & 0 deletions b/‎.decapod/OVERRIDE.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎constitution/architecture/ALGORITHMS.md‎
Lines changed: 10 additions & 0 deletions b/‎constitution/architecture/ALGORITHMS.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎constitution/architecture/CACHING.md‎
Lines changed: 10 additions & 0 deletions b/‎constitution/architecture/CACHING.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎constitution/architecture/CLOUD.md‎
Lines changed: 11 additions & 0 deletions b/‎constitution/architecture/CLOUD.md‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎constitution/architecture/CONCURRENCY.md‎
Lines changed: 26 additions & 14 deletions b/‎constitution/architecture/CONCURRENCY.md‎
Lines changed: 26 additions & 14 deletions
diff --git a/‎constitution/architecture/DATA.md‎
Lines changed: 12 additions & 0 deletions b/‎constitution/architecture/DATA.md‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎constitution/architecture/FRONTEND.md‎
Lines changed: 12 additions & 0 deletions b/‎constitution/architecture/FRONTEND.md‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎constitution/architecture/OBSERVABILITY.md‎
Lines changed: 12 additions & 0 deletions b/‎constitution/architecture/OBSERVABILITY.md‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎constitution/architecture/SECURITY.md‎
Lines changed: 14 additions & 0 deletions b/‎constitution/architecture/SECURITY.md‎
Lines changed: 14 additions & 0 deletions
@@ -13,6 +13,8 @@
 
 ## Core Overrides (Routers and Indices)
 
+### core/ENGINEERING_EXCELLENCE.md
+
 ### core/DECAPOD.md
 
 ### core/INTERFACES.md
 
@@ -30,6 +30,16 @@
 - **Parallelism:** Amdahl's Law limits
 - **Constants:** 2× slower is still O(n)
 
+### 1.4 Production Mindset
+The gap between academic algorithm knowledge and production engineering is real:
+
+- **Standard libraries first:** Most business value lives in domain logic, not sorting internals. Use language-native, battle-tested implementations. Custom algorithms are warranted only when the standard approach imposes a measurable, load-bearing bottleneck.
+- **Maintenance cost is a first-class constraint:** A clever algorithm maintained by one person is a single point of failure. Favor correct and readable over theoretically optimal.
+- **Data locality beats asymptotic complexity for small n:** Most production operation sets are small (n < 1000). O(n²) with cache-friendly sequential access frequently outperforms O(n log n) with pointer chasing. The memory wall is the real bottleneck in modern hardware.
+- **Prefer scale-out over scale-up:** An O(n log n) algorithm that parallelizes cleanly across 100 machines is often more practical than an O(n) algorithm that must remain single-threaded.
+- **Determinism is a correctness property:** In a system governed by reproducible validation, algorithms must produce identical output for identical input. Avoid non-deterministic choices (e.g., unseed random pivots) anywhere output is compared or stored.
+- **Resource budgets are not optional:** Every algorithm must have time and memory bounds enforced at the call site. An algorithm that may run forever or allocate without limit is a bug, not a performance risk.
+
 ---
 
 ## 2. Complexity Analysis
 
@@ -35,6 +35,16 @@ Cache is a **performance optimization**, not a:
 | Consistency | Stale | Fresh |
 | Complexity | High | Low |
 
+### 1.4 Production Mindset
+Before adding a cache, establish a performance budget and verify the cache is necessary:
+
+- **Cache only when the system demands it:** If the system meets latency targets without a cache, adding one only introduces a failure mode. Measure first.
+- **Stale data has a business cost:** The acceptable staleness window is a product decision, not an engineering default. A price shown 5 minutes late may be catastrophically wrong; a user's display name shown 5 minutes stale is harmless. Make this explicit.
+- **A cache is a stateful dependency:** If the cache goes offline and the origin cannot absorb the resulting load, the cache has become load-bearing infrastructure — that is a fragile architecture. Design so the system degrades gracefully when the cache is cold or absent.
+- **CDN vs application cache are different tools:** CDNs serve public, edge-delivered assets; distributed caches (Redis) handle session and application state. Using the wrong layer for the wrong data adds complexity and consistency bugs.
+- **TTL is a fallback, not a strategy:** Time-based expiry is a safety net for when event-driven invalidation fails. For data with defined write paths, use explicit or event-driven invalidation and treat TTL as the last resort.
+- **Measure total round-trip cost:** Serialization and deserialization often exceed the network round-trip for a direct DB read. Benchmark the full cache path before assuming it is faster.
+
 ---
 
 ## 2. Cache Levels
 
@@ -44,6 +44,17 @@
 - Spot instances for fault-tolerant workloads
 - Right-sizing resources
 
+### 1.5 Production Mindset
+Cloud infrastructure decisions have direct business consequences. Apply the same rigor to infrastructure as to application code:
+
+- **Unit economics are the architecture test:** If the cost to serve one customer exceeds the revenue they generate, the architecture is broken regardless of how elegantly it scales. Every architectural decision has a cost per unit; make it visible.
+- **Portability is leverage, not ideology:** Full vendor lock-in is a negotiating failure. Using managed services accelerates delivery — that's the right trade — but core domain logic must remain portable enough to migrate within a reasonable window if vendor economics turn predatory.
+- **Click-ops in production is a defect:** Infrastructure that was configured through a web console cannot be reviewed, versioned, tested, or recovered reliably. Every production state change must be expressed in code and promoted through the same review process as application changes.
+- **Cost is an engineering signal, not a finance problem:** If an engineer cannot explain the cost impact of a PR, it cannot ship. Cloud spend is a direct output of architectural decisions; teams own that number.
+- **Stateless compute is the default contract:** Any compute that accumulates local state breaks auto-scaling and complicates recovery. If an instance cannot be terminated safely at any moment, the system is brittle by design.
+- **FaaS has a shape constraint:** Serverless functions are excellent for event-driven, bursty workloads. They are poor fits for consistent, high-throughput, latency-sensitive APIs where cold starts are visible and predictable resource allocation matters.
+- **Least privilege is non-negotiable:** IAM roles must be scoped per service, per action, per resource. Wildcard permissions in production are a critical security defect. A compromised service must not be a pivot to adjacent systems.
+
 ---
 
 ## 2. Compute Options
 
@@ -25,6 +25,18 @@
 
 **Async:** Use for I/O-bound work with many concurrent connections. Understand the cost: async runtimes add complexity, stack traces become harder to read, and cancellation semantics require care.
 
+### 1.3 Production Mindset
+Concurrency is one of the highest-leverage and highest-risk categories of engineering decisions:
+
+- **Sequential first:** Do not reach for concurrent architectures until the sequential baseline is exhausted. The simplest correct program is single-threaded. Concurrency is justified by measured need, not anticipated scale.
+- **Coordination is the bottleneck:** Amdahl's Law is a hard limit. If 10% of a workload is sequential, no amount of parallelism yields more than 10× improvement. Design to minimize the sequential fraction, and be explicit about where it lives.
+- **Blast radius isolation:** A concurrency bug — deadlock, live-lock, data race — can bring down an entire process or starve a thread pool. Isolate concurrent subsystems behind clear boundaries so failures cannot cascade.
+- **Backpressure is a correctness property:** A system that cannot say "no" when overloaded is not production-ready. Every concurrent queue must be bounded. Unbounded queues are memory leaks with a delayed fuse.
+- **Immutability eliminates the problem class:** Shared mutable state is the root cause of most concurrency bugs. Prefer immutable data, message passing, and copy-on-write semantics. When mutable state is unavoidable, make lock discipline explicit and reviewed.
+- **Explicit state machines over ad-hoc coordination:** Complex concurrent workflows modeled with boolean flags and informal protocols will contain bugs that cannot be reproduced or proven correct. Model them as explicit state machines with defined transitions.
+- **Lock-free is not "free":** Lock-free data structures are expert territory. Unless implementing a low-level primitive where profiling justifies it, lock-free code introduces correctness hazards that testing rarely catches. Use well-tested library implementations.
+- **Async is not free either:** Async runtimes have scheduling overhead. For CPU-bound work, async adds overhead without benefit; use dedicated thread pools. Watch stack sizes, allocation rates, and wake-up patterns under load.
+
 ---
 
 ## 2. Async Discipline
@@ -116,7 +128,20 @@ Rules:
 
 ---
 
-## 5. Anti-Patterns
+## 5. Coordination Patterns
+
+### 5.1 Fan-Out / Fan-In
+Distribute work across workers, collect results. Use bounded concurrency to prevent resource exhaustion.
+
+### 5.2 Pipeline
+Chain processing stages with channels between them. Each stage runs independently. Backpressure propagates naturally through bounded channels.
+
+### 5.3 Circuit Breaker
+When an external service fails repeatedly, stop calling it temporarily. Prevents cascade failures and gives the service time to recover.
+
+---
+
+## 6. Anti-Patterns
 
 | Anti-Pattern | Why It's Dangerous | Alternative |
 |---|---|---|
@@ -129,19 +154,6 @@ Rules:
 
 ---
 
-## 6. Coordination Patterns
-
-### 6.1 Fan-Out / Fan-In
-Distribute work across workers, collect results. Use bounded concurrency to prevent resource exhaustion.
-
-### 6.2 Pipeline
-Chain processing stages with channels between them. Each stage runs independently. Backpressure propagates naturally through bounded channels.
-
-### 6.3 Circuit Breaker
-When an external service fails repeatedly, stop calling it temporarily. Prevents cascade failures and gives the service time to recover.
-
----
-
 ## Links
 
 - `methodology/ARCHITECTURE.md` - binding architecture
 
@@ -31,6 +31,18 @@ Every data entity has a single owner:
 - Owner handles migrations
 - Other services access through defined interfaces
 
+### 1.4 Production Mindset
+Data decisions compound over years. Schema choices made at week one outlive three engineering teams:
+
+- **Data is the primary asset:** The most durable output of any engineering effort is clean, structured, accessible data. Code is a snapshot; data persists. Decisions must be data-driven, which requires data to be high-fidelity.
+- **Avoid proprietary data lock-in:** Core data should live in open, portable formats (Postgres, Parquet, Avro). Vendor-specific binary formats create migration debt that compounds as volume grows.
+- **Schema before storage:** There is no such thing as "schemaless in production" — only schema that is unknown to the database and therefore unenforceable. Express schema explicitly using protobuf, JSON Schema, or equivalent. Unstructured data is just data whose structure you haven't modeled yet.
+- **Privacy and deletion are architecture requirements:** Compliance (GDPR, CCPA, HIPAA) is the legal floor. Deletion and anonymization must be designed into the data model from the start, not retrofitted. Data that cannot be deleted on demand is an incident waiting to happen.
+- **Consistency model is a design choice, not a default:** Understand where your system sits in the CAP theorem and make it explicit. Core transactional state requires consistency (CP). High-frequency event logs can tolerate availability-priority (AP). Never drift into an unexamined middle.
+- **Design for the next migration:** Every data structure should be written with its own evolution in mind. If the schema cannot support two live versions simultaneously, the design is incomplete.
+- **Referential integrity is absolute:** If the database supports foreign keys, use them. If it does not, enforce integrity in the application layer. Orphaned references are data rot, and data rot compounds silently until a system fails in an unrecoverable way.
+- **N+1 is an architectural smell:** A loop that issues one query per item is not a performance optimization opportunity — it is a design defect. Use joins, batching, or projection. Catch it in review, not production.
+
 ---
 
 ## 2. Storage Selection Framework
 
@@ -38,6 +38,18 @@
 - Color contrast (WCAG AA minimum)
 - Focus management
 
+### 1.5 Production Mindset
+The frontend is not a layer — it is the product. Every decision that degrades the user experience degrades the product itself:
+
+- **Time-to-interactive is a revenue metric:** A bloated JavaScript bundle has a direct, measurable impact on conversion and retention. Every new dependency must justify its payload weight. If a library costs 200KB to format a date, replace it with 5 lines.
+- **Framework stability over novelty:** Rewriting the frontend every time a new framework trends is a net loss. Choose a mature, well-supported ecosystem and hold it. Innovation belongs in the user experience and product capability, not the build toolchain.
+- **Accessibility is a correctness requirement, not a backlog item:** If a core flow cannot be completed with a keyboard and screen reader, the feature is defective. This is both an ethical and legal obligation, and it must be verified before any flow is marked complete.
+- **Standardized components over bespoke CSS:** A consistent, accessible component library is a force multiplier. Custom widget implementations for standard patterns (buttons, modals, selects) accumulate accessibility debt and design drift. Use and maintain a shared system.
+- **State locality reduces complexity:** The largest source of frontend complexity is state that lives farther from its use site than necessary. Reach for global state only when multiple disconnected components strictly require synchronization. Local and URL state should be the defaults.
+- **Choose the rendering model for the use case:** SSR and SSG are the correct defaults for content-heavy pages and SEO-critical surfaces. Pay the cost of a full SPA only when the interface genuinely requires app-level interactivity that cannot be achieved otherwise.
+- **Server-state libraries are the standard:** Manual `useEffect` for data fetching is error-prone and widely superseded. Libraries like React Query and SWR handle caching, deduplication, background refresh, and error states correctly. Use them.
+- **Monitor bundle size as a first-class metric:** Tree-shaking must be verified, not assumed. Bundle analysis should run in CI. Size regressions are caught at PR review, not discovered when performance degrades in production.
+
 ---
 
 ## 2. Rendering Strategies
 
@@ -25,6 +25,18 @@
 - **Sampling is acceptable for high-volume data.** 100% capture at low volume, statistical sampling at high volume.
 - **Cost of observability < cost of not observing.** If you can't see it, you can't fix it.
 
+### 1.3 Production Mindset
+Observability is not a feature bolted on after the system is built — it is the primary mechanism by which a system proves it is operating correctly:
+
+- **SLIs and SLOs are the engineering-business contract:** Service Level Indicators define what "working" means in measurable terms. SLOs define the acceptable threshold. When within error budget, ship features. When outside it, fix reliability. This is not optional and does not require negotiation.
+- **Mean Time to Detection must approach zero:** The goal of observability is to know about a failure before the customer does. If the customer reports the issue first, the observability layer has already failed its primary function.
+- **Telemetry must be correlated:** Metrics, logs, and traces in isolation are incomplete. A single trace ID must link a user-visible request to a specific log line and a spike in a latency histogram. Siloed observability is expensive noise.
+- **Semantic logging, not mechanical logging:** Logs are data, not strings. A log entry should capture the intent and outcome of an operation, not just a sequential chronicle of function calls. Log what happened and why it matters, with machine-parseable fields.
+- **Distributed tracing is mandatory in concurrent systems:** When a request touches multiple async components or services, debugging without a trace is guesswork. Instrument trace propagation at service boundaries from the start — it cannot be added cheaply after the fact.
+- **Instrumentation is production code:** Observability code must be tested, reviewed, and maintained at the same standard as business logic. A silent failure caused by missing or broken instrumentation is a critical defect.
+- **High-volume logs are noise:** Logging every function call or intermediate state is log pollution. It increases cost, slows queries, and buries real signals. Log at the appropriate level; sample traces aggressively at high volume.
+- **The audit trail is the system of record:** In Decapod, observability is the mechanism by which completion is proved. An operation that is not in the audit log did not happen as far as the system is concerned.
+
 ---
 
 ## 2. Structured Logging
 
@@ -45,6 +45,20 @@
 - Security requirements are functional requirements
 - Security reviews for architectural changes
 
+### 1.5 Production Mindset
+Security is a property of the system, not a feature layer. Systems that require security to be "added" before release have already failed at architecture:
+
+- **Assume the perimeter is already breached:** Design every component assuming a network-adjacent attacker exists. Lateral movement must be architecturally impossible, not just blocked by policy. Microsegmentation, mTLS, and zero-trust identity make this enforceable.
+- **Trust is technical debt:** Every trusted component or interface is a potential pivot point. Minimize trust boundaries explicitly. Document what is trusted, why, and what the consequences of that trust being violated are.
+- **Compliance is the floor, not the ceiling:** Meeting SOC2 or HIPAA means you satisfy a minimum legal standard. Real security requires adversarial thinking. Red-team your own architecture before an attacker does.
+- **Security must be automated to scale:** Manual security reviews on every PR are a bottleneck that developers will eventually route around. SAST, DAST, dependency scanning, and secret detection must run in CI on every change, without exceptions.
+- **Policy exceptions are vulnerabilities:** An exception to a security policy is a vulnerability with documentation. If a policy is consistently too strict to follow, fix the policy through a formal process — do not grant individual exceptions.
+- **Identity is the perimeter in cloud-native systems:** IP-based trust is meaningless in elastic, multi-tenant infrastructure. Use strong cryptographic identity (mTLS, SPIFFE/SPIRE) for every service-to-service interaction.
+- **Immutable infrastructure limits blast radius:** A compromised instance must not be patched in place. Kill it and redeploy from a known-good image. This is only possible if compute is stateless and infrastructure is defined in code.
+- **Secure defaults are the only reliable defaults:** Any configuration, API, or library that requires explicit action to enable security will eventually ship insecure. Defaults must be secure. Opt-in for relaxed behavior, never opt-in for security.
+- **Agents must operate with minimum necessary context:** When agents process external data or operate on the codebase, they must have access only to the files, tools, and credentials their specific task requires. Over-privileged agents are a significant attack surface. Scope everything.
+- **Validation is the final gate:** In Decapod, `decapod validate` is the last line of automated defense. A change that violates a security specification cannot be promoted. This gate is non-negotiable.
+
 ---
 
 ## 2. Threat Modeling