cybersecurity · beginner · ~15 min

Identify the file format from the first 8 bytes

Magic-byte detection — the universal forensic primitive.

Challenge

Implement int detect_format(const unsigned char *buf, int len) returning one of:

  • 1 if it looks like an ELF binary (\x7fELF)
  • 2 if it looks like a Mach-O binary (\xfe\xed\xfa\xce or \xfe\xed\xfa\xcf or the reverse for fat headers — accept all 4)
  • 3 if it looks like a Windows PE (MZ at offset 0)
  • 4 if it looks like a Java class file (\xca\xfe\xba\xbe)
  • 5 if it looks like wasm (\x00asm)
  • 0 otherwise (or if len < 4)

Why this matters

Forensic triage of an unknown file always starts with magic-byte detection. file(1) does it; you'll build the same primitive in 30 lines of C.

Input format

Byte buffer + length.

Output format

One of 0/1/2/3/4/5.

Constraints

Read only buf[0..3]; never index past len.

Starter code

int detect_format(const unsigned char *buf, int len) { /* TODO */ (void)buf; (void)len; return 0; }

Common mistakes

Indexing buf[3] without checking len >= 4.

Edge cases to handle

Empty buffer; very short buffer.

Complexity

O(1).

Background lessons

Up next

Solve this exercise in the browser editor — compile and run against the test harness, no setup required.