cybersecurity · intermediate · ~25 min

Extract the host from a URL safely

Strict URL host extraction with explicit reject rules.

Challenge

Implement int extract_host(const char *url, char *out_host, size_t cap).

The URL is expected in the form scheme://host[:port][/path][?query][#fragment]. Extract just the host part into out_host, NUL-terminated, capped to cap-1 bytes.

Return:

  • 1 on success.
  • 0 on any of: missing scheme, missing ://, empty host, host doesn't fit in out_host.

The function MUST refuse:

  • URLs with no ://.
  • URLs where the host part is empty.
  • URLs with userinfo (http://user:pass@host/...) — refuse rather than parse (a known XSS vector).

Examples

extract_host("http://example.com/foo")           -> 1, out_host="example.com"
extract_host("https://api.example.com:8443/v1")  -> 1, out_host="api.example.com"
extract_host("http://127.0.0.1")                 -> 1, out_host="127.0.0.1"
extract_host("file:///etc/hosts")                -> 0   // empty host
extract_host("not-a-url")                        -> 0
extract_host("http://user:pw@evil/foo")          -> 0   // userinfo refused
extract_host("")                                 -> 0

Why this matters

URL parsing is a notorious source of security bugs (CVEs in nginx, curl, browsers). Real production code uses a vetted library. But understanding the structure — and writing a strict, conservative parser — teaches you what those libraries are doing and why they reject so many edge cases.

Input format

url null-terminated; out_host buffer of cap bytes.

Output format

1 and NUL-terminated host in out_host, or 0.

Constraints

No external parsers; pure C scanning. Cap-aware copy.

Starter code

#include <stddef.h>
int extract_host(const char *url, char *out_host, size_t cap) { /* TODO */ return 0; }

Common mistakes

Accepting userinfo and using it as the host. Forgetting to handle the optional :port. Not capping the copy at cap-1.

Edge cases to handle

Empty URL; URL with no path; IPv4 literal; userinfo (must reject).

Complexity

O(strlen(url)).

Background lessons

Up next

Solve this exercise in the browser editor — compile and run against the test harness, no setup required.