Safe Penetration Testing Labs · beginner · ~12 min

Count unique subdomains in a wordlist

Walk a newline-separated wordlist and return the number of unique non-empty entries.

Overview

Split on \n, lowercase each line into a local buffer, search a small array of seen entries.

Why it matters

Deduplication is the first stage of every passive recon pipeline. Doing it in C teaches you O(n²)-with-a-cap discipline.

Lesson

Why this matters

Subdomain wordlists from tools like Amass, Subfinder, or Project Discovery's dnsx pipelines are just \n-separated text. Before you fan out and query them, you de-duplicate. We're going to write that deduplicator in C — on a static buffer, no DNS lookups.

What the file looks like

www
api
mail
www       <- duplicate
admin
          <- empty line, ignored
api       <- duplicate

Your job

Implement int count_unique_domains(const char *list). Return the number of unique non-empty lines, ignoring case for comparison. NULL input → 0.

Bounded for this exercise: assume the wordlist holds at most 256 entries. Reject (return -1) if more than 256 distinct entries are present, so the auditor knows the cap was hit.

Common mistakes

  • Treating case-different lines as distinct (WWW vs www).
  • Counting empty lines.
  • Forgetting that the last line may not end in \n.

What this is NOT

  • A live DNS resolver. We never call getaddrinfo.
  • A wildcard / regex matcher.

Summary

Linear walk + linear lookup in a 256-slot table.

Practice with these exercises