Why Most Software Ships With Unproven Assumptions
Most software teams face a correctness paradox: they need the rigor of formal verification but can’t afford its cost. Test-Driven Development (TDD) offers a middle path—a pragmatic bridge that translates requirements into executable specifications before coding begins. Formal verification—the mathematical proof that a system behaves as intended—remains a luxury reserved for railways, medical devices, and cryptographic protocols. For the rest of us, it’s simply too expensive, too rigid, or too slow for real-world development. This creates a correctness gap: teams ship software with unproven assumptions and treat bugs as an inevitable tax.
But here’s the reality: most bugs aren’t coding errors—they’re undeclared requirements. A missed edge cases in a payment flow. An ambiguous error-handling specification. A mismatch between what users need and what developers build. These aren’t execution failures; they’re failures of clarity.
The data backs this up:
60‒80% of production defects trace back to ambiguous or incomplete requirements (figures widely cited across industry reports, including IBM’s Reducing Rework Through Effective Requirements Management).
Late-stage bug fixes cost 10–100x more than catching issues during design (NASA JSC’s study: Error Cost Escalation Through the Project Life Cycle).
Yet teams are forced into a false dichotomy:
Formal verification: Rigorous but impractical—confined to safety-critical domains where failure means life, not downtime (Formal methods: Practice and experience. ACM Comput. Surv. 41, 4, Article 19).
Ad-hoc testing: Fast but reckless—the default for many teams, where verification is an afterthought until something breaks.
No wonder we accept bugs. The alternatives demand a choice: grind to a halt or ship and pray.
TDD breaks the stalemate. It’s verification that scales—not a replacement for formal methods, but a pragmatic bridge between untested code and mathematical proofs. It doesn’t promise perfection. Instead, it offers something far more practical:
A way to turn requirements into executable examples before writing implementation code.
A focus on reducing ambiguity—because correctness isn’t about perfection; it’s about clarity under constraints.
TDD: The Pragmatic Bridge Between Chaos and Formal Proofs
TDD disciplines developers to translate requirements into code—not through proofs, but through executable examples. The process follows a deceptively simple cycle:

Red: Write a test for a single behavior (e.g., "User can undo last action"). It fails because the functionality doesn’t exist yet.
Green: Write the minimal code to pass the test. No more, no less.
Refactor: Improve the code’s structure while keeping tests passing—preserving behavior.
This cycle mirrors formal verification in key ways:
Tests encode preconditions/postconditions into setup/assertions.
Refactoring with passing tests preserves behavior, akin to proof preservation.
How TDD Enforces SOLID Principles—Without You Noticing
TDD isn’t just about testing—it’s about designing for testability, which naturally enforces SOLID principles. For example:
Single Responsibility: A hard-to-write test often signals a class doing too much.
Dependency Inversion: Testing through interfaces (not implementations) forces cleaner abstractions.
Formal Verification vs. TDD: When to Use Each
| Aspect | Formal Verification | Test-Driven Development |
| Basis | Abstract mathematical proofs | Concrete executable examples |
| Cost | High (experts, tools) | Low (developer time, standard tools) |
| Flexibility | Rigid (specifications must be complete upfront) | Adaptive (tests evolve with code) |
| Coverage | Complete (for specs) | Sample-based (key behaviors) |
| Tooling | Specialized (e.g., TLA+, Rocq, Isabelle, Atelier B) | Mainstream (e.g., JUnit, pytest, gtest) |
| Best For | Safety-critical logic (e.g., aerospace, medical) | Business logic, APIs, and workflows |
TDD is the sketch to formalism’s blueprint—less rigorous, but far more practical. It won’t guarantee perfection, but it eliminates the majority of costly misunderstandings with minimal overhead. The tradeoff isn’t rigor vs. sloppiness; it’s upfront clarity vs. late-stage fire drills.
Design by Contract (DbC): Runtime Guards, Not Test Replacements
TDD verifies behavior through executable examples, DbC enforces invariants. They’re complementary:
| Aspect | Design by Contract | Test-Driven Development |
| Focus | Declarative rules (what must be true) | Executable examples (what should happen) |
| Verification | Static analysis + runtime checks | Dynamic test execution |
| Tooling | Language-specific (e.g., Eiffel, Code Contracts for C#, icontract for Python) | Framework-agnostic (e.g., JUnit, pytest, gtest) |
| Strength | Catches violations dynamically | Validates examples before deployment |
| Limitation | Runtime cost (contract checks) | Misses invariants not in tests |
| Mindset | "Define rules that must always hold." | "Show me examples of correct behavior." |
| Best For | Critical safety nets, API boundaries | Business logic, workflow validation |
Key Insight: DbC isn’t a replacement for testing—it’s an extra layer of defense.
TDD asks: "Does this code behave correctly for these inputs?"
DbC asks: "Does this code violate any rules—no matter the input?"
Together, they cover examples (TDD) and universal rules (DbC).
The Hidden Cost of Late-Stage Bugs—and How TDD Cuts It
Software development has a hidden tax: the later a bug is found, the more it costs. A typo fixed during coding takes minutes. The same typo discovered in production? Emergency patches, rollbacks, reputational damaged. These costs aren’t linear—they’re exponential (NASA JSC’s study: Error Cost Escalation Through the Project Life Cycle).
TDD’s ROI: Fail Early, Save 100x
TDD shifts debugging left—from production to design—by turning requirements into executable examples before implementation. This creates a design-time feedback loop that:
Exposes ambiguities early
Writing tests first forces clarity before coding.
Example: A vague requirement like "Handle payment errors gracefully" becomes a concrete test case:
test_payment_error_shows_user_friendly_message().
Replaces "code-deploy-debug" with "test-code-verify"
Traditional workflow: Code → Deploy → Discover Bug → Debug → Fix → Re-deploy
TDD workflow: Write Test → Code → Verify → Refactor
Difference: TDD surfaces issues when they're cheap to fix.
Acts as static analysis for requirements
Each test is a "breakpoint" for understanding
Failing tests reveal logic gaps before they become bugs
TDD is static debugging for requirements—it catches misunderstandings before you code, not after you ship.
Behavior-Driven TDD: Write Tests That Survive Refactoring
A common complaint about TDD isn't that it doesn't improve quality—it's that the tests keep breaking. But brittle tests don’t reflect a flaw in TDD; they signal that you’re testing the wrong things. The root cause? Coupling tests to implementation instead of behavior.
The Root Cause of Brittle Tests
Tests break during refactoring when they’re coupled to implementation, not behavior. Brittle tests assume details like
Internal structure (e.g., private methods, class names)
Implementation details (e.g., "Cart has 5 methods")
Specific algorithms (e.g., "Uses binary search")
The fix? Test what the code should do, not how it does it.
Test Behavior, Not Implementation: A Rule to Live By
Focus on outputs and collaborations, not internals.
✅ Do:
Test outputs: Verify visible behavior, not internal state
# ✅ Behavior-focused: tests outcome, not implementation def test_user_can_add_items_to_cart(): cart = Cart() cart.add(item) assert item in cart.contentsTest collaborations: Use interfaces, not concrete classes
# ✅ Mock the interface, not the implementation def test_payment_processes(PaymentProcessor: Protocol): processor = Mock(PaymentProcessor) cart.checkout(processor) processor.charge.assert_called_once()
❌ Avoid:
Mocking concrete classes (
Mock(StripeGateway))Asserting on private methods (
assert _calculate_tax())Testing framework internals (
assert model.save())
✅ Green Flags:
Test names describe user goals (e.g.,
test_user_can_undo_action)Tests survive refactoring (e.g., changing
CarttoShoppingBasket)Each test answers: "What behavior does this enable?"
How to Turn Requirements Into Executable Tests
Requirements must be actionable and specific to convert into test cases. Vague requirements like "the system should be user-friendly" are difficult to test.
Requirements Smell Test:
✅ Testable: Can you write a failing test for it?
✅ Specific: Defines what, not how.
✅ Measurable: Clear success/failure criteria.
Well-defined requirements convert directly into test cases.
Robust tests are executable specifications—they document what the system should do, not how it does it. This makes them:
Resilient to refactoring (implementation can change)
Valuable as documentation (clear purpose)
User-focused (not technical concerns)
TDD Workflow: Red-Green-Refactor Done Right
TDD isn’t a theoretical ideal; it’s a practical workflow that adapts to real-world constraints. Here’s how to make it work for your team—without dogma.
Red-Green-Refactor: The TDD Rhythm Explained
TDD’s rhythm is simple but powerful in its discipline:

Red: Define user-centric behavior (not methods/classes).
❌ "Test
Cart.addItem()" (implementation-focused).✅ "User can add an item to their cart and see the updated total" (behavior-focused).
Green: Solve it the simplest way possible (hardcode if needed).
- Why? Forces focus on the interface before implementation.
Refactor: Improve code with tests as a safety net.
Now that tests guard the behavior, clean up the code:
Apply SOLID principles (e.g., extract interfaces for mocking).
Remove duplication.
Consider parametrization and performance if relevant.
Rule: If a test breaks during refactoring, it was testing implementation.
CI/CD for TDD: Automate Your Test Reviews
Problem: Engineers skip manual test runs because waiting breaks flow.
Solution: Automation that enforces verification without human intervention.
Structure test automation for speed and relevance:
| Test Type | CI Trigger | Purpose |
| Unit tests | Every commit | Catch logic errors fast |
| Integration tests | Pull request | Verify component interactions |
| End-to-end tests | Nightly/staged | Validate user journeys |
This works because:
Fast feedback keeps developers in flow, with unit tests completing before they switch tasks.
Fail-fast policies block merges on any failure, ensuring the test suite remains a trusted gatekeeper.
The full test suite runs automatically, eliminating gaps from manual "spot checking."
Parallel execution and tiered triggers prevent CI from becoming the bottleneck.
Lean Test Suites: Prune Like a Garden
A healthy test suite grows and shrinks with your system:
Add tests for new behaviors.
Remove tests when behaviors are deprecated.
Prune with purpose:
Deprecated feature? Delete both the code and its tests.
Low coverage? Ask:
Is the untested code required? → Test it.
Is it dead weight? → Delete it.
Every test should verify behavior that delivers user value.
Why? Tests are executable documentation—they should reflect current requirements, not historical debris.
Result: A lean suite that’s faster, more focused, and trusted by the team.
Pragmatic TDD: Start Small, Scale Smart
TDD isn’t an all-or-nothing proposition. Its value lies in pragmatic application, not rigid adherence. Here’s how to adopt it incrementally while preserving its core value.
Where to Start
Apply TDD selectively where it delivers the most value:
High-risk areas: Payment processing, state machines, or safety-critical logic.
New features: Write tests before implementing to clarify requirements.
Bug fixes: Write a test to reproduce the issue, which ensures it stays fixed and prevents regressions.
Even in legacy systems, new code = new opportunity. Start with the next feature or fix.
Sell It to Your Team
Frame TDD as design insurance, not extra work:
For developers: "Tests are living documentation—they force clarity on what the code should do, not just how it works."
For managers: "TDD shifts debugging left, where fixes are 100x cheaper."
For skeptics: "It’s not about 100% coverage—it’s about targeted correctness where it counts."
This might be a useful metaphor: "TDD is like a seatbelt: You don’t wear it for the smooth ride, but for when things go wrong."
TDD’s Blind Spots: 3 Cases Where You Need More
TDD provides a floor for correctness, but its sample-based nature misses critical gaps. Address them with these targeted supplements:
| Area of Risk | Why TDD Falls Short | What It Misses | How to Supplement |
| Implementation | Focuses on behavior, not internals. | Type/memory safety, performance bottlenecks. | Static analysis + instrumentation + profiling. |
| Complex Behavior | Assumes deterministic execution. | Race conditions, state transitions, external dependencies. | Model checking + chaos engineering. |
| Mathematical Guarantees | Tests samples, not universals. | Infinite inputs, invariants, edge cases. | Property-based testing + Design by Contract + formal methods. |
Rule of Thumb: Start with TDD for core logic, then layer in supplements based on your system’s risk profile.
TDD doesn’t require all-or-nothing adoption—it’s about failing cheaply on the code that matters. Pick one high-risk component (a payment flow, a state transition, a recurring bug), and write the test you wish you’d had last time it broke. Measure the difference. That’s how pragmatic correctness begins.
Dávid Juhász