cybersecurity · intermediate · ~15 min · safe pentest lab

Score a URL for phishing structural smells

Heuristic scoring with explicit, auditable rules.

Challenge

Score a URL for structural phishing smells — the cheap, auditable first layer that runs before any ML model.

Task

Implement int phishy_score(const char *url) that adds 1 point for each rule the URL matches and returns the total (or -1 if url is NULL).

Rules, 1 point each:

The URL contains @.
The hostname has more than 4 dots.
The hostname is longer than 40 characters.
The hostname has a run of 3 or more consecutive digits.
The URL contains xn--.
The hostname contains - and a brand keyword from {paypal, apple, bank, microsoft, google, amazon} (case-insensitive).

Hostname extraction: skip a leading http:// or https://, then take everything up to the next / (or end of string). With no scheme, the hostname starts at the beginning.

Input

url: a NUL-terminated URL string the grader passes. No fetch, DNS, or network — pure string analysis.

Output

Returns int: the number of rules matched (>= 0), or -1 if url is NULL.

Example

phishy_score("https://example.com/")        ->   0
phishy_score("https://example.com@evil.io/") >=  1   (contains @)
phishy_score("https://login123.com/")        >=  1   (digit run)
phishy_score("https://xn--exmple-cua.com/")  >=  1   (punycode)
phishy_score("https://paypal-secure.com/")   >=  1   (brand + dash)
phishy_score("https://paypal.com/")          ==  0   (brand, no dash)
phishy_score(NULL)                            ==  -1

Edge cases

A brand keyword without a - in the hostname scores 0 for rule 6.
Only the hostname is scored — keywords in the path do not count.

Rules

Static string only — no URL fetch, DNS lookup, or network I/O.

Why this matters

Structural smells are the cheapest first layer of any phishing detector. Get the score right and most of the work is done before any ML runs.

Input format

A NUL-terminated URL string url.

Output format

An int: total points (>=0) for matched rules, or -1 if url is NULL.

Constraints

Hostname bounded at 256 chars; brand match is case-insensitive; no network.

Starter code

int phishy_score(const char *url) {
    /* TODO */
    (void)url;
    return 0;
}

Common mistakes

Counting brand keywords in the path. Not enforcing the >= 3 in a run (instead counting total digits). Forgetting to skip the scheme.

Edge cases to handle

No scheme. Hostname-only URL. URL with userinfo before @.

Complexity

O(n) over the URL length.

Background lessons

Score a URL for phishing markers

Up next

Format a contactless-card UID as colon-hex

Solve this exercise in the browser editor — compile and run against the test harness, no setup required.