Networking in C · intermediate · ~15 min

Parse an HTTP/1.1 request line in C

Walk through an HTTP request line and extract the method, path, and version into a struct.

Overview

Read the three tokens, copy them into a struct with bounded copies, and reject the input on overflow or a bad terminator.

Why it matters

The request line is where every HTTP attack starts. A bounded, allow-list parser stops the easy attacks at the door.

Lesson

Why this matters

Every web proxy, every WAF (web application firewall), and every reverse-proxy access log begins the same way. They read the first line of an HTTP request and pull out three tokens:

the method (for example GET)
the request-target (the path, for example /index.html)
the version (for example HTTP/1.1)

This first line is plain ASCII text and ends in \r\n (a carriage return followed by a newline). Because the whole protocol is text-driven, a C parser for it is small and worth reading closely.

This is the parser side of tools like Burp, mitmproxy, and nginx's access log. Here, we just write it ourselves.

What the wire looks like

A raw request arrives like this:

GET /index.html HTTP/1.1\r\n
Host: example.com\r\n
\r\n

The request line is the first line: three space-separated tokens, followed by \r\n.

Your job

Implement this function:

int parse_request_line(const char *buf, http_req_t *out);

The http_req_t struct holds three fixed-size (bounded) char arrays:

method[8]
path[256]
version[16]

Return 0 on success, or -1 on any malformed input.

Rules

Reject the input if any field would overflow its bound. Never use strcpy without a bounds check.
Reject the line if it does not end in \r\n.
The path may contain /, alphanumerics, ?, &, =, ., -, and _. Reject any other character for this exercise.

Common mistakes

Using sscanf("%s %s %s", ...) without length specifiers. That is an uncontrolled write into memory.
Forgetting to check for the \r\n terminator.
Allowing absurdly long paths because the per-field bound was never enforced.

What this is NOT

A full HTTP parser. Headers, the body, and chunked encoding are all skipped.
A request-smuggling detector. That topic lives in parse-http-smuggling-defence.

Summary

Key takeaways

The HTTP request line has three space-separated tokens: method, path, and version.
Split on the spaces and copy each token into a fixed-size field.
Always enforce explicit length checks. Reject input that overflows a field or lacks the \r\n terminator.