cybersecurity · intermediate · ~15 min · safe pentest lab

Score a URL for phishing structural smells

Heuristic scoring with explicit, auditable rules.

Challenge

Your job

Implement:

int phishy_score(const char *url);

For each rule the URL hits, add 1 to the score. Return the total. NULL → -1.

Rules (1 point each)

  1. The URL contains @.
  2. The hostname has more than 4 dots.
  3. The hostname is longer than 40 characters.
  4. The hostname has a run of >= 3 consecutive digits.
  5. The URL contains xn--.
  6. The hostname contains - AND a brand keyword from {paypal, apple, bank, microsoft, google, amazon} (case-insensitive).

Hostname extraction

  • Skip past http:// or https:// if present.
  • The hostname is everything from the start to the next / (or end of string).
  • If the URL doesn't have a scheme, the hostname starts at the beginning.

Hints

  1. (concept) Compute the hostname slice once; then run six tiny checks.
  2. (common bug) Counting brand keywords in the path. Only the hostname counts.

Why this matters

Structural smells are the cheapest first layer of any phishing detector. Get the score right and most of the work is done before any ML runs.

Input format

A NUL-terminated URL string.

Output format

Score >= 0, or -1 on NULL.

Constraints

Bounded hostname length 256. Case-insensitive brand match.

Starter code

int phishy_score(const char *url) {
    /* TODO */
    (void)url;
    return 0;
}

Common mistakes

Counting brand keywords in the path. Not enforcing the >= 3 in a run (instead counting total digits). Forgetting to skip the scheme.

Edge cases to handle

No scheme. Hostname-only URL. URL with userinfo before @.

Complexity

O(n) over the URL length.

Background lessons

Up next

Solve this exercise in the browser editor — compile and run against the test harness, no setup required.