Search

Software Engineer's Notes

Tag

Automated Testing

Unit Testing: The What, Why, and How (with Practical Examples)

What is unit test?

What is a Unit Test?

A unit test verifies the smallest testable part of your software—usually a single function, method, or class—in isolation. Its goal is to prove that, for a given input, the unit produces the expected output and handles edge cases correctly.

Key characteristics

  • Small & fast: millisecond execution, in-memory.
  • Isolated: no real network, disk, or database calls.
  • Repeatable & deterministic: same input → same result.
  • Self-documenting: communicates intended behavior.

A Brief History (How We Got Here)

  • 1960s–1980s: Early testing practices emerged with procedural languages, but were largely ad-hoc and manual.
  • 1990s: Object-oriented programming popularized more modular designs. Kent Beck introduced SUnit for Smalltalk; the “xUnit” family was born.
  • Late 1990s–2000s: JUnit (Java) and NUnit (.NET) pushed unit testing mainstream. Test-Driven Development (TDD) formalized “Red → Green → Refactor.”
  • 2010s–today: Rich ecosystems (pytest, Jest, JUnit 5, RSpec, Go’s testing pkg). CI/CD and DevOps turned unit tests into a daily, automated safety net.

How Unit Tests Work (The Mechanics)

Arrange → Act → Assert (AAA)

  1. Arrange: set up inputs, collaborators (often fakes/mocks).
  2. Act: call the method under test.
  3. Assert: verify outputs, state changes, or interactions.

Test Doubles (isolate the unit)

  • Dummy: unused placeholders to satisfy signatures.
  • Stub: returns fixed data (no behavior verification).
  • Fake: lightweight implementation (e.g., in-memory repo).
  • Mock: verifies interactions (e.g., method X called once).
  • Spy: records calls for later assertions.

Good Test Qualities (FIRST)

  • Fast, Isolated, Repeatable, Self-Validating, Timely.

Naming & Structure

  • Name: methodName_condition_expectedResult
  • One assertion concept per test (clarity > cleverness).
  • Avoid coupling to implementation details (test behavior).

When Should We Write Unit Tests?

  • New code: ideally before or while coding (TDD).
  • Bug fixes: add a unit test that reproduces the bug first.
  • Refactors: guard existing behavior before changing code.
  • Critical modules: domain logic, calculations, validation.

What not to unit test

  • Auto-generated code, trivial getters/setters, framework wiring (unless it encodes business logic).

Advantages (Why Unit Test?)

  • Confidence & speed: safer refactors, fewer regressions.
  • Executable documentation: shows intended behavior.
  • Design feedback: forces smaller, decoupled units.
  • Lower cost of defects: catch issues early and cheaply.
  • Developer velocity: faster iteration with guardrails.

Practical Examples

Java (JUnit 5 + Mockito)

// src/test/java/com/example/PriceServiceTest.java
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;
import static org.mockito.Mockito.*;

class PriceServiceTest {
    @Test
    void applyDiscount_whenVIP_shouldReduceBy10Percent() {
        DiscountPolicy policy = mock(DiscountPolicy.class);
        when(policy.discountFor("VIP")).thenReturn(0.10);

        PriceService service = new PriceService(policy);
        double result = service.applyDiscount(200.0, "VIP");

        assertEquals(180.0, result, 0.0001);
        verify(policy, times(1)).discountFor("VIP");
    }
}

// Production code (for context)
class PriceService {
    private final DiscountPolicy policy;
    PriceService(DiscountPolicy policy) { this.policy = policy; }
    double applyDiscount(double price, String tier) {
        return price * (1 - policy.discountFor(tier));
    }
}
interface DiscountPolicy { double discountFor(String tier); }

Python (pytest)

# app/discount.py
def apply_discount(price: float, tier: str, policy) -> float:
    return price * (1 - policy.discount_for(tier))

# tests/test_discount.py
class FakePolicy:
    def discount_for(self, tier):
        return {"VIP": 0.10, "STD": 0.0}.get(tier, 0.0)

def test_apply_discount_vip():
    from app.discount import apply_discount
    result = apply_discount(200.0, "VIP", FakePolicy())
    assert result == 180.0

In-Memory Fakes Beat Slow Dependencies

// In-memory repository for fast unit tests
class InMemoryUserRepo implements UserRepo {
    private final Map<String, User> store = new HashMap<>();
    public void save(User u){ store.put(u.id(), u); }
    public Optional<User> find(String id){ return Optional.ofNullable(store.get(id)); }
}

Integrating Unit Tests into Your Current Process

1) Organize Your Project

/src
  /main
    /java (or /python, /ts, etc.)
  /test
    /java ...

  • Mirror package/module structure under /test.
  • Name tests after the unit: PriceServiceTest, test_discount.py, etc.

2) Make Tests First-Class in CI

GitHub Actions (Java example)

name: build-and-test
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-java@v4
        with: { distribution: temurin, java-version: '21' }
      - run: ./gradlew test --no-daemon

GitHub Actions (Python example)

name: pytest
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.12' }
      - run: pip install -r requirements.txt
      - run: pytest -q

3) Define “Done” with Tests

  • Pull requests must include unit tests for new/changed logic.
  • Code review checklist: readability, edge cases, negative paths.
  • Coverage gate (sensible threshold; don’t chase 100%).
    Example (Gradle + JaCoCo):
jacocoTestCoverageVerification {
    violationRules {
        rule { limit { counter = 'INSTRUCTION'; minimum = 0.75 } }
    }
}
test.finalizedBy jacocoTestReport, jacocoTestCoverageVerification

4) Keep Tests Fast and Reliable

  • Avoid real I/O; prefer fakes/mocks.
  • Keep each test < 100ms; whole suite in seconds.
  • Eliminate flakiness (random time, real threads, sleeps).

5) Use the Test Pyramid Wisely

  • Unit (broad base): thousands, fast, isolated.
  • Integration (middle): fewer, verify boundaries.
  • UI/E2E (tip): very few, critical user flows only.

A Simple TDD Loop You Can Adopt Tomorrow

  1. Red: write a failing unit test that expresses the requirement.
  2. Green: implement the minimum to pass.
  3. Refactor: clean design safely, keeping tests green.
  4. Repeat; keep commits small and frequent.

Common Pitfalls (and Fixes)

  • Mock-heavy tests that break on refactor → mock only at boundaries; prefer fakes for domain logic.
  • Testing private methods → test through public behavior; refactor if testing is too hard.
  • Slow suites → remove I/O, shrink fixtures, parallelize.
  • Over-asserting → one behavioral concern per test.

Rollout Plan (4 Weeks)

  • Week 1: Set up test frameworks, sample tests, CI pipeline, coverage reporting.
  • Week 2: Add tests for critical modules & recent bug fixes. Create a PR template requiring tests.
  • Week 3: Refactor hot spots guided by tests. Introduce an in-memory fake layer.
  • Week 4: Add coverage gates, stabilize the suite, document conventions in CONTRIBUTING.md.

Team Conventions

  • Folder structure mirrors production code.
  • Names: ClassNameTest or test_function_behavior.
  • AAA layout, one behavior per test.
  • No network/disk/DB in unit tests.
  • PRs must include tests for changed logic.

Final Thoughts

Unit tests pay dividends by accelerating safe change. Start small, keep them fast and focused, and wire them into your daily workflow (pre-commit, CI, PR reviews). Over time, they become living documentation and your best shield against regressions.

Fuzzing: A practical guide for software engineers

What is fuzzing?

Fuzzing is an automated testing technique that feeds large numbers of malformed, unexpected, or random inputs to a program to find crashes, hangs, memory corruption, and other security/robustness bugs. This post explains what fuzzing is, key features and types, how it works (step-by-step), advantages and limitations, real-world use cases, and exactly how to integrate fuzzing into a modern software development process.

What is fuzzing?

Fuzzing (or “fuzz testing”) is an automated technique for finding bugs by supplying a program with many inputs that are unusual, unexpected, or deliberately malformed, and observing for failures (crashes, assertion failures, timeouts, resource leaks, incorrect output, etc.). Fuzzers range from simple random-input generators to sophisticated, feedback-driven engines that learn which inputs exercise new code paths.

Fuzzing is widely used both for security (discovering vulnerabilities an attacker could exploit) and for general robustness testing (finding crashes and undefined behaviour).

Key features (explained)

  1. Automated input generation
    • Fuzzers automatically produce a large volume of test inputs — orders of magnitude more than manual testing — which increases the chance of hitting rare edge cases.
  2. Monitoring and detection
    • Fuzzers monitor the program for signals of failure: crashes, memory-safety violations (use-after-free, buffer overflow), assertion failures, infinite loops/timeouts, and sanitizer reports.
  3. Coverage / feedback guidance
    • Modern fuzzers use runtime feedback (e.g., code coverage) to prefer inputs that exercise previously unvisited code paths, greatly improving effectiveness over pure random mutation.
  4. Instrumentation
    • Instrumentation (compile-time or runtime) gathers execution information such as branch coverage, comparisons, or tainting. This enables coverage-guided fuzzing and faster discovery of interesting inputs.
  5. Test harness / drivers
    • The target often needs a harness — a small wrapper that feeds inputs to a specific function or module — letting fuzzers target internal code directly instead of whole applications.
  6. Minimization and corpus management
    • Good fuzzing workflows reduce (minimize) crashing inputs to the smallest test case that still reproduces the issue, and manage corpora of “interesting” seeds to guide future fuzzing.
  7. Triage and deduplication
    • After crashes are detected, automated triage groups duplicates (same root cause), classifies severity, and collects debugging artifacts (stack trace, sanitizer output).

How fuzzing works — step by step

  1. Choose the target
    • Could be a file parser (image, audio), protocol handler, CLI, library function, or an API endpoint.
  2. Prepare a harness
    • Create a small driver that receives raw bytes (or structured samples), calls the function under test, and reports failures. For binaries, you can fuzz the whole process; for libraries, fuzz the API function directly.
  3. Select a fuzzer and configure
    • Pick a fuzzer (mutation-based, generation-based, coverage-guided, etc.) and configure timeouts, memory limits, sanitizers, and the initial corpus (seed files).
  4. Instrumentation / sanitizers
    • Build the target with sanitizers (AddressSanitizer, UndefinedBehaviorSanitizer, LeakSanitizer) and with coverage hooks (if using coverage-guided fuzzing). Instrumentation enables detection and feedback.
  5. Run the fuzzer
    • The fuzzer runs thousands to millions of inputs, mutating seeds, tracking coverage, and prioritizing inputs that increase coverage.
  6. Detect and record failures
    • On crash or sanitizer report, the fuzzer saves the input and a log, optionally minimizing the input and capturing a stack trace.
  7. Triage
    • Deduplicate crashes (e.g., by stack trace), prioritize (security impact, reproducibility), and assign to developers with reproduction steps.
  8. Fix & regress
    • Developers fix bugs and add new regression tests (the minimized crashing input) to the test suite to prevent regressions.
  9. Continuous fuzzing
    • Add long-running fuzzing to nightly/CI (or to a fuzzing infrastructure) to keep finding issues as code changes.

Types of fuzzing

By knowledge of the target

  • Black-box fuzzing
    • No knowledge of internal structure. Inputs are sent to the program and only external outcomes are observed (e.g., crash/no crash).
    • Cheap and easy to set up, but less efficient for deep code.
  • White-box fuzzing
    • Uses program analysis (symbolic execution or constraint solving) to craft inputs that satisfy specific paths/conditions.
    • Can find deep logical bugs but is computationally expensive and may not scale to large codebases.
  • Grey-box fuzzing
    • Hybrid approach: uses lightweight instrumentation (coverage) to guide mutations. Most modern practical fuzzers (AFL-family, libFuzzer) are grey-box.
    • Good balance of performance and depth.

By generation strategy

  • Mutation-based
    • Start from seed inputs and apply random or guided mutations (bit flips, splice, insert). Effective when good seeds exist.
  • Generation-based
    • Inputs are generated from a model/grammar (e.g., a JSON generator or network protocol grammar). Good for structured inputs and when valid format is critical.
  • Grammar-based
    • Use a formal grammar of the input format to generate syntactically valid/interesting inputs, often combined with mutation.

By goal/technique

  • Coverage-guided fuzzing
    • Uses runtime coverage to prefer inputs that exercise new code paths. Highly effective for native code.
  • Differential fuzzing
    • Runs the same input against multiple implementations (e.g., different JSON parsers) and looks for inconsistencies in outputs.
  • Mutation + symbolic (concolic)
    • Combines concrete execution with symbolic analysis to solve comparisons and reach guarded branches.
  • Network / protocol fuzzing
    • Sends malformed packets/frames to network services; may require stateful harnesses to exercise authentication or session flows.
  • API / REST fuzzing
    • Targets HTTP APIs with unexpected payloads, parameter fuzzing, header fuzzing, and sequence fuzzing (order of calls).

Advantages and benefits

  • High bug-finding power
    • Finds crashes, memory errors, and edge cases that manual tests and static analysis often miss.
  • Scalable and parallelizable
    • Many fuzzers scale horizontally — run multiple instances on many cores/machines.
  • Security-driven
    • Effective at revealing exploitable memory-safety bugs (especially for C/C++), reducing attack surface.
  • Automatable
    • Can be integrated into CI/CD or as long-running background jobs (nightly fuzzers).
  • Low human effort per test
    • After harness creation and configuration, fuzzing generates and runs vast numbers of tests automatically.
  • Regression prevention
    • Crashes found by fuzzing become regression tests that prevent reintroduction of bugs.

Limitations and considerations

  • Need a good harness or seeds
    • Mutation fuzzers need representative seed corpus; generation fuzzers need accurate grammars/models.
  • Can be noisy
    • Many crashes may be duplicates or low priority; triage is essential.
  • Not a silver bullet
    • Fuzzing targets runtime bugs; it won’t find logical errors that don’t cause abnormal behaviour unless you instrument checks.
  • Resource usage
    • Fuzzing can be CPU- and time-intensive. Long-running fuzzing infrastructure helps.
  • Coverage vs depth tradeoff
    • Coverage-guided fuzzers are excellent for code coverage, but for complex semantic checks you may need white-box techniques or custom checks.

Real-world examples (practical case studies)

Example 1 — Image parser in a media library

Scenario: A C++ image decoding library processes user-supplied images.
What you do:

  • Create a harness that takes raw bytes and calls the image decode function.
  • Seed with a handful of valid image files (PNG, JPEG).
  • Build with AddressSanitizer (ASan) and compile-time coverage instrumentation.
  • Run a coverage-guided fuzzer (mutation-based) for several days.
    Outcome: Fuzzer generates a malformed chunk that causes a heap buffer overflow. ASan detects it; the input is minimized and stored. Developer fixes bounds check and adds the minimized file as a regression test.

Why effective: Parsers contain lots of complex branches; small malformed bytes often trigger deep logic leading to memory safety issues.

Example 2 — HTTP API fuzzing for a microservice

Scenario: A REST microservice parses JSON payloads and stores data.
What you do:

  • Use a REST fuzzer that mutates fields, numbers, strings, and structure (or use generation from OpenAPI spec + mutation).
  • Include authentication tokens and sequence flows (create → update → delete).
  • Monitor for crashes, unhandled exceptions, incorrect status codes, and resource consumption.
    Outcome: Fuzzer finds an unexpected null pointer when a certain nested structure is missing — leads to 500 errors. Fix adds input validation and better error handling.

Why effective: APIs often trust input structure; fuzzing uncovers missing validation, parsing edge cases, or unintended code paths.

Example 3 — Kernel / driver fuzzing (security focused)

Scenario: Fuzzing a kernel-facing driver interface (e.g., ioctls).
What you do:

  • Use a specialized kernel fuzzer that generates syscall sequences or malformed ioctl payloads, and runs on instrumented kernel builds.
  • Use persistent fuzzing clusters to run millions of testcases.
    Outcome: Discover a use-after-free triggered by a race of ioctl calls; leads to CVE fix.

Why effective: Low-level concise interfaces are high-risk; fuzzers explore sequences and inputs that humans rarely test.

How and when to use fuzzing (practical guidance)

When to fuzz

  • Parsers and deserializers (image, audio, video, document formats).
  • Protocol implementations (HTTP, TLS, custom binary protocols).
  • Native libraries in C/C++ — memory safety bugs are common here.
  • Security-critical code paths (authentication, cryptography wrappers, input validation).
  • Newly written code — fuzz early to catch regressions.
  • Third-party code you integrate: fuzzing can reveal hidden assumptions.

How to pick a strategy

  • If you have sample files → start with coverage-guided mutation fuzzer and seeds.
  • If input is structured (grammar) → use grammar-based or generation fuzzers.
  • If testing across implementations → differential fuzzing.
  • If deep logical constraints exist → consider white-box/concolic tooling or property-based tests.

Integrating fuzzing into your development process

Here’s a practical, step-by-step integration plan that works for teams of all sizes.

1) Start small — pick one high-value target

  • Choose a small, high-risk component (parser, protocol handler, or a library function).
  • Create a minimal harness that feeds arbitrary bytes (or structured inputs) to the function.

2) Build for fuzzing

  • Compile with sanitizers (ASan, UBSan) and enable coverage instrumentation (clang’s libFuzzer or AFL compile options).
  • Add deterministic seed corpus (valid samples) and known edge cases.

3) Local experiments

  • Run quick local fuzzing sessions to ensure harness is stable and crashes are reproducible.
  • Implement simple triage: crash minimization and stack traces.

4) Add fuzzing to CI (short runs)

  • Add a lightweight fuzz job to CI that runs for a short time (e.g., 10–30 minutes) on PRs that touch the target code.
  • If new issues are found, the PR should fail or annotate with findings.

5) Long-running fuzzing infrastructure

  • Run continuous/overnight fuzzing on dedicated workers (or cloud instances). Persist corpora and crashes.
  • Use parallel instances with different seeds and mutation strategies.

6) Automate triage and ticket creation

  • Use existing tools (or scripts) to group duplicate crashes, collect sanitizer outputs, and file tickets or create GitHub issues with reproducer and stack trace.

7) Make regressions tests mandatory

  • Every fix must include the minimized crashing input as a unit/regression test. Add file to tests/fuzz/regressors.

8) Expand coverage across the codebase

  • Once comfortable, gradually add more targets, including third-party libraries, and integrate API fuzzing for microservices.

9) Operational practices

  • Monitor fuzzing metrics: code coverage, unique crashes, time to first crash, triage backlog.
  • Rotate seeds, update grammars, and re-run fuzzers after major changes.
  • Educate developers on writing harnesses and interpreting sanitizer output.

Practical tips & best practices

  • Use sanitizers (ASan/UBSan/MSan) to catch subtle memory and undefined behaviour.
  • Start with good seeds — a few valid samples dramatically improves mutation fuzzers.
  • Minimize crashing inputs automatically to simplify debugging.
  • Keep harnesses stable — harnesses that themselves crash or leak make fuzzing results noisy.
  • Persist and version corpora — adding new seeds that found coverage helps future fuzzes.
  • Prioritize triage — a backlog of unanalyzed crashes wastes value.
  • Use fuzzing results as developer-owned responsibilities — failing to fix crashes undermines confidence in fuzzing.

Example minimal harness (pseudocode)

C (using libFuzzer-style entry):

#include <stddef.h>
#include <stdint.h>

// target function in your library
extern int parse_image(const uint8_t *data, size_t size);

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // call into the library under test
    parse_image(data, size);
    return 0; // non-zero indicates error to libFuzzer
}

Python harness for a CLI program (mutation via custom fuzzer):

import subprocess, tempfile

def run_one(input_bytes):
    with tempfile.NamedTemporaryFile() as f:
        f.write(input_bytes)
        f.flush()
        subprocess.run(["/path/to/mytool", f.name], timeout=5)

# fuzzing loop (very simple)
import os, random
seeds = [b"\x89PNG...", b"\xff\xd8..."]
while True:
    s = bytearray(random.choice(seeds))
    # random mutation
    for _ in range(10):
        i = random.randrange(len(s))
        s[i] = random.randrange(256)
    try:
        run_one(bytes(s))
    except Exception as e:
        print("Crash:", e)
        break

Suggested tools & ecosystem (conceptual, pick what fits your stack)

  • Coverage-guided fuzzers: libFuzzer, AFL/AFL++ family, honggfuzz.
  • Grammar/generation: Peach, LangFuzz, custom generators (JSON/XML/ASN.1).
  • API/HTTP fuzzers: OWASP ZAP, Burp Intruder/Extender, custom OpenAPI-based fuzzers.
  • Infrastructure: OSS-Fuzz (for open source projects), self-hosted clusters, cloud instances.
  • Sanitizers: AddressSanitizer, UndefinedBehaviorSanitizer, LeakSanitizer, MemorySanitizer.
  • CI integration: run short fuzz sessions in PR checks; long runs on scheduled runners.

Note: choose tools that match your language and build system. For many C/C++ projects, libFuzzer + ASan is a well-supported starter combo; for binaries without recompilation, AFL with QEMU mode or network fuzzers may be used.

Quick checklist to get started (copy into your project README)

  • Pick target (parser, API, library function).
  • Create minimal harness and seed corpus.
  • Build with sanitizers and coverage instrumentation.
  • Run a local fuzzing session and collect crashes.
  • Minimize crashes and add regressors to test suite.
  • Add short fuzz job to PR CI; schedule long fuzz runs nightly.
  • Automate triage and track issues.

Conclusion

Fuzzing is one of the highest-leverage testing techniques for finding low-level crashes and security bugs. Start with one target, instrument with sanitizers and coverage, run both short CI fuzz jobs and long-running background fuzzers, and make fixing and regressing fuzz-found issues part of your development flow. Over time you’ll harden parsers, network stacks, and critical code paths — often catching bugs that would have become security incidents in production.

Blog at WordPress.com.

Up ↑