Structs & Data Structures · intermediate · ~20 min
Build and reason about hash tables; pick a sensible hash function.
Hash tables give average-case O(1) lookup by converting keys to integer indices via a hash function, then resolving collisions inside each bucket. They're the most common dictionary type in C (and in everything built on C).
Every C program that needs key/value lookup uses a hash table somewhere. The Linux kernel's caches, glibc's symbol resolver, the C runtime of every higher-level language — all hash tables.
Hash function. Maps key bytes to a uniform-looking integer. FNV-1a and DJB2 are common simple choices; for security-critical code use SipHash (HashDoS-resistant).
Bucket index. hash % nbuckets. If nbuckets is a power of two, hash & (nbuckets - 1) is faster.
Collision resolution. Chaining keeps a linked list of entries per bucket. Open addressing probes nearby buckets. Chaining is simpler; open addressing is faster on modern caches.
Pentester mindset. Naive hash tables can be DoS'd: attacker submits many keys that all hash to the same bucket. The fix is SipHash + a per-process secret seed.
Defensive coding habit. For any externally-supplied keys, use SipHash with a random seed. Track load factor; rehash above 75%.
typedef struct entry { char *key; int val; struct entry *next; } entry_t;
entry_t **buckets; /* array of head pointers */
size_t nbuckets;
A hash function maps a variable-length key to a fixed-size integer (the hash). A hash table uses that integer modulo the bucket count to locate a slot. Collisions are inevitable; we handle them with chaining (linked lists per bucket) or open addressing.
/* FNV-1a 32-bit hash */
uint32_t fnv1a(const char *s){
uint32_t h = 2166136261u;
while (*s) { h ^= (uint8_t)*s++; h *= 16777619u; }
return h;
}
int table_get(entry_t **buckets, size_t nb, const char *key, int *out){
uint32_t h = fnv1a(key);
for (entry_t *e = buckets[h % nb]; e; e = e->next){
if (strcmp(e->key, key) == 0){ *out = e->val; return 1; }
}
return 0;
}
key % bucket_count directly with a bucket_count that's a power of 2 plus a weak hash — predictable collisions.Print the load factor (entries / buckets). If it exceeds 0.75, rehash to 2x.
Each bucket's linked list must be freed at destroy time; both the key (strdup'd) and the entry struct.
Environment lookup (getenv), DNS caches, language interpreters' symbol tables, every web server's route table.
Hash table = hash function + bucket array + collision strategy. FNV-1a for simple cases, SipHash for security-sensitive ones.