cybersecurity · beginner · ~12 min · safe pentest lab

Count unique non-empty subdomains in a wordlist

Linear-time deduplication against a fixed-capacity seen-set in pure C.

Challenge

Your job

Implement:

int count_unique_domains(const char *list);

Walk the \n-separated input. Ignore empty lines. Compare lines case-insensitively. Return the number of unique entries, or 0 if list == NULL. Cap at 256 distinct entries — if more, return -1.

Examples

  • "www\napi\nwww\nmail\n" → 3
  • "www\nWWW\n" → 1 (case-insensitive)
  • "\n\n\n" → 0
  • NULL → 0

Hints

  1. (concept) Tokenise on \n, lowercase each token into a stack buffer, then linear-scan a seen[256][64] table.
  2. (common bug) Reading past the end when the input doesn't end in \n. Stop at \0.

Why this matters

Recon pipelines start by deduplicating wordlists. Writing the deduper in C teaches the linear-scan + small-table pattern.

Input format

A NUL-terminated string of \n-separated subdomain labels.

Output format

Unique count (0..256), or -1 if the cap is exceeded.

Constraints

Cap distinct entries at 256. Per-entry label <= 63 chars.

Starter code

int count_unique_domains(const char *list) {
    /* TODO */
    (void)list;
    return 0;
}

Common mistakes

Forgetting trailing-NL-less input. Allowing empty lines through. Reading past NUL.

Edge cases to handle

Cap at exactly 256. Case variation. Single line without newline.

Complexity

O(input_len * unique_count). Bounded by 256 unique × 64 bytes.

Background lessons

Up next

Solve this exercise in the browser editor — compile and run against the test harness, no setup required.