Secure Coding in C · intermediate · ~20 min

Safe parsing — the defensive parser shape

Write parsers that refuse malformed input early and obviously.

Overview

Defensive parsing is strict by construction. You write a grammar; the parser refuses anything outside it; on rejection it returns a clear error code and frees any partial state.

Why it matters

Parser bugs ship in every CVE-list category: HTTP request smuggling, XML XXE, JSON prototype pollution (in C extensions), file-format escape, archive zip-slip. Strict parsing is the cheapest fix.

Core concepts

Cap before parse. Read at most MAX_INPUT bytes. Refuse longer.

Tokenize, don't regex. Tokens have explicit grammars; regexes hide bugs.

Default reject. Every switch / branch should end with a default: return -1; that explicitly refuses unknown input.

Pentester mindset. When two parsers disagree about what a piece of input means, you have a vulnerability (request smuggling, HTTP/2 desync, etc.). Strictness reduces the disagreement surface.

Defensive coding habit. Fail closed: on any error, free partial state and return a non-zero code. Never carry on with corrupted intermediate state.

Syntax notes

See state-machines lesson for the FSM skeleton; see input-validation for the boundary discipline.

Lesson

A safe parser is one that: (1) caps input length; (2) refuses anything not in its grammar; (3) reports failure with a specific error code; (4) leaves the program in a clean state on failure. The opposite — 'parse what you can, ignore the rest' — is the source of most parser CVEs.

Code examples

int parse_strict(const char *s, ...){
    if (!s || strlen(s) > MAX_INPUT) return -1;
    /* ... parse, refusing anything unexpected ... */
}

Line by line

int parse_strict_int(const char *s, int lo, int hi, int *out){
    if (!s || !*s) return -1;
    char *end;
    long v = strtol(s, &end, 10);
    if (*end != '\0') return -1;        /* trailing garbage — reject */
    if (v < lo || v > hi) return -1;     /* out of range — reject */
    *out = (int)v;
    return 0;
}

Common mistakes

  • Permissive parsing ('be liberal in what you accept'). It's a security anti-pattern.

Debugging tips

Hand-roll a corpus of malformed inputs and run your parser against each; assert each is rejected.

Memory safety

Always free partial allocations on the failure path. A common bug is allocating into a struct, hitting a parse error, and returning without freeing.

Real-world uses

HTTP parsers, config-file readers, packet decoders, image-format loaders, every interface between your program and untrusted bytes.

Practice tasks

  1. Write a strict integer parser with bounds and trailing-garbage rejection. 2. Write a strict CSV-row parser. 3. Audit one of your existing parsers for permissive behaviour.

Summary

Cap, tokenize, default-reject, fail closed. The four-line recipe for parsers that don't ship CVEs.

Practice with these exercises