File Handling · beginner · ~15 min

Log parsing — line iteration + classification

Stream-process a log file and aggregate per-key statistics.

Overview

Log parsing is the most common task in operational pen testing and incident response. The recipe: open file → loop with fgets → classify each line → update counters → at end-of-stream emit summary.

Why it matters

Logs are the historic record. They contain forensic evidence, anomaly markers, and the trail of any incident. Reading them fast and accurately is a daily skill for defenders and red-teamers alike.

Core concepts

Stream, don't slurp. Read line-by-line; never load multi-gigabyte logs into memory.

Strip CR/LF. fgets keeps the trailing newline. Strip it before any string compare.

Cap line length. Cap at e.g. 4 KB to refuse malicious oversized lines.

Pentester mindset. Attackers inject \r\n into fields they control (filename, User-Agent, etc.) to forge log lines. Detect by scanning user-controlled fields for control bytes before logging.

Defensive coding habit. Any field you log that came from outside must have CR/LF stripped first; this prevents log injection (CWE-117).

Syntax notes

FILE *fp = fopen(path, "r");
char line[4096];
while (fgets(line, sizeof line, fp)){
    line[strcspn(line, "\n")] = 0;  /* strip newline */
    /* classify / aggregate */
}
fclose(fp);

Lesson

Logs are line-delimited text. A parser walks them one line at a time, classifies each line (info/warn/error, or by per-IP, per-user, per-route), and aggregates a counter or running statistic.

Code examples

char line[4096];
while (fgets(line, sizeof line, fp)) {
    if (strstr(line, "Failed password")) count_failed++;
}

Line by line

static int extract_ip(const char *line, char *out, size_t cap){
    const char *from = strstr(line, "from ");
    if (!from) return 0;
    from += 5;
    const char *port = strstr(from, " port");
    if (!port) return 0;
    size_t n = (size_t)(port - from);
    if (n + 1 > cap) return 0;
    memcpy(out, from, n); out[n] = 0;
    return 1;
}

Common mistakes

  • Reading the whole file into memory. Stream instead.
  • Treating log content as trusted (attackers can inject \n).

Debugging tips

Run your parser against an empty file, a one-line file, a file without a trailing newline, and a file with embedded NUL bytes. Each of those breaks naive parsers.

Memory safety

fgets always NUL-terminates. The size argument INCLUDES the NUL byte — fgets(buf, sizeof buf, fp) is the correct idiom.

Real-world uses

SSH brute-force detection, web-server analytics, anomaly alerts, billing aggregation, incident-response timeline reconstruction.

Practice tasks

  1. Count lines in a fixture log. 2. Per-IP failed-login counter. 3. Redact PII (emails) from each line.

Summary

Stream, classify, aggregate. Always cap line length; always strip CR/LF; always treat content as untrusted.

Practice with these exercises