linux-sysprog · intermediate · ~45 min

Final Project: mini-shell — argv tokenizer

Stateful character-by-character parsing with multiple modes (in-word / in-quote).

Challenge

Implement int shell_split(const char *line, char **argv, int max_argv):

  • Splits on whitespace.
  • Treats "..." as one token (without the quote characters).
  • Writes pointers into argv[0..ret-1], NULL-terminates after.
  • May modify line? No — allocate each token via strdup. Caller frees.
  • Returns the number of tokens (<= max_argv-1), or -1 on parse error (unterminated quote).

Why this matters

A POSIX shell is at heart a tokenizer + a fork/exec loop. The trickiest bit is splitting a command line into argv while respecting quotes — every shell on every UNIX-like OS does this dance, and getting it right is a great pointer/string exercise.

Input format

line is a null-terminated ASCII string. argv has space for max_argv pointers.

Output format

Number of tokens written, or -1 on parse error.

Constraints

Use a small state machine. Don't use strtok (it would mangle the input).

Starter code

#include <stddef.h>
int shell_split(const char *line, char **argv, int max_argv) { /* TODO */ return 0; }

Common mistakes

Mishandling escape sequences (we don't require them — keep it simple); forgetting to NULL-terminate argv; not freeing tokens on the error path.

Edge cases to handle

Empty line returns 0. Trailing whitespace. Two consecutive spaces. Quoted empty string "".

Complexity

O(n) time, O(tokens) memory.

Background lessons

Solve this exercise in the browser editor — compile and run against the test harness, no setup required.