Safe Penetration Testing Labs · intermediate · ~15 min

Score a URL for phishing markers

Compute a heuristic phishing score from a URL string.

Overview

Find the host part between // and /; count signals one-by-one; return the total.

Why it matters

Most phishing detectors are layered: structural smells first, then ML, then live fetch. Layer one is the cheapest and quickest.

Lesson

Why this matters

Phishing URL detectors look at structural smells before they look at content: too many dots, embedded IPs, IDN punycode, @ in the authority, dashes inside the second-level domain, suspiciously long hostnames.

We score those signals. We do not fetch the URL.

Heuristics (one point each)

  • The URL contains @ (authority spoofing).
  • The hostname has more than 4 dots.
  • The hostname is longer than 40 characters.
  • The hostname contains a digit run of length >= 3 (e.g. login123).
  • The URL contains xn-- (IDN punycode — neutral signal, but often abused).
  • The hostname contains a dash AND a known brand keyword (paypal, apple, bank, microsoft, google, amazon).

Your job

Implement int phishy_score(const char *url). Sum the scores and return the total. NULL → -1.

Common mistakes

  • Scoring the path. Brand keywords in the path don't count — only the hostname.
  • Forgetting that xn-- is a prefix of a label, not a substring of the whole URL (it's still useful as a flag here).

What this is NOT

  • A URL parser. Real parsing needs to handle userinfo, ports, IPv6 literals, percent-encoding — that's a follow-up exercise.
  • A blocklist consulter.

Summary

Six rules, one pass over the hostname, integer score.

Practice with these exercises