cybersecurity · intermediate · ~25 min
Strict URL host extraction with explicit reject rules.
Implement int extract_host(const char *url, char *out_host, size_t cap).
The URL is expected in the form scheme://host[:port][/path][?query][#fragment]. Extract just the host part into out_host, NUL-terminated, capped to cap-1 bytes.
Return:
1 on success.0 on any of: missing scheme, missing ://, empty host, host doesn't fit in out_host.The function MUST refuse:
://.http://user:pass@host/...) — refuse rather than parse (a known XSS vector).extract_host("http://example.com/foo") -> 1, out_host="example.com"
extract_host("https://api.example.com:8443/v1") -> 1, out_host="api.example.com"
extract_host("http://127.0.0.1") -> 1, out_host="127.0.0.1"
extract_host("file:///etc/hosts") -> 0 // empty host
extract_host("not-a-url") -> 0
extract_host("http://user:pw@evil/foo") -> 0 // userinfo refused
extract_host("") -> 0
URL parsing is a notorious source of security bugs (CVEs in nginx, curl, browsers). Real production code uses a vetted library. But understanding the structure — and writing a strict, conservative parser — teaches you what those libraries are doing and why they reject so many edge cases.
url null-terminated; out_host buffer of cap bytes.
1 and NUL-terminated host in out_host, or 0.
No external parsers; pure C scanning. Cap-aware copy.
#include <stddef.h>
int extract_host(const char *url, char *out_host, size_t cap) { /* TODO */ return 0; }
Accepting userinfo and using it as the host. Forgetting to handle the optional :port. Not capping the copy at cap-1.
Empty URL; URL with no path; IPv4 literal; userinfo (must reject).
O(strlen(url)).
Solve this exercise in the browser editor — compile and run against the test harness, no setup required.