Arrays & Strings · beginner · ~12 min
Find one string inside another; understand the algorithmic options.
Substring search is the most-used string primitive. Most production code uses strstr (good enough for short patterns) or memmem (binary-safe).
Logs, headers, configs, payloads — all string content where you ask 'does this contain X'. Knowing the cost matters when scanning gigabytes.
Naive. Walk haystack, at each position compare against needle. O(n*m).
KMP / Boyer-Moore. Pre-process needle for O(n) average or sub-linear average. Code is fiddly; rarely needed in pen-test work.
Multi-pattern. Aho-Corasick is the algorithm; YARA and ClamAV use variants.
Pentester mindset. Anti-malware engines, IDS rules, and log scanners are all multi-pattern search at scale.
Defensive coding habit. Always cap needle length; never assume haystack is NUL-terminated when it came from a network read — use memmem with explicit length.
const char *strstr(const char *haystack, const char *needle);
void *memmem(const void *h, size_t hn, const void *n, size_t nn); /* GNU/BSD */
C's standard strstr finds the first occurrence of needle in haystack with
a naive O(n*m) scan. For larger inputs use Boyer-Moore or KMP (sub-linear average).
For multi-pattern, use Aho-Corasick.
const char *p = strstr(haystack, needle);
if (p) printf("found at offset %td\n", p - haystack);
int contains(const char *line, const char *bad){
return strstr(line, bad) != NULL;
}
Print the offset where the match was found and the surrounding ~20 chars; lets you sanity-check matches by eye.
strstr walks until NUL. For binary buffers use memmem or write a length-bounded variant.
Log filters, malware signature scanners, web-server URL routers, intrusion detection.
strstr for case-insensitive search. 2. Implement count_occurrences. 3. Reject any line containing '../' (path-traversal allow-list check).Use strstr for short patterns; memmem for binary; Aho-Corasick for many patterns. Always cap lengths.