Computer & OS Fundamentals · beginner · ~9 min

Compression and archives

Distinguish archiving from compression and know the common formats and their risks.

Overview

Archiving bundles files (tar); compression shrinks them (gzip/xz/zip). Lossless compression removes redundancy, so high-entropy data barely shrinks — a hint it's encrypted/packed. Archives carry real risks: zip-slip path traversal and zip bombs.

Why it matters

Backups and exfiltrated data come as archives, so handling them is routine. Zip-slip and zip bombs are concrete vulnerability classes in any code that extracts user-supplied archives — both offensive and defensive review targets.

Core concepts

Archive vs compress. tar bundles; gzip/xz/zip shrink. .tar.gz vs .zip. Unix vs cross-platform norms. Entropy tell. Won't-compress data is likely encrypted/packed. Zip-slip. Malicious paths escape the extract dir. Zip bomb. Tiny input, huge expansion → DoS.

Lesson

Archiving bundles many files into one; compression shrinks data by removing redundancy. They're often combined.

Archive vs compress

  • tar bundles files into one .tar (no compression by itself).
  • gzip/bzip2/xz compress a single stream → .gz, .bz2, .xz.
  • .tar.gz (a.k.a. .tgz) = tar then gzip — the Unix norm.
  • zip does both archiving and compression in one container (the Windows/cross-platform norm).

How compression works (briefly)

Lossless compression replaces repeated patterns with shorter references. Already-random or already-compressed data (encrypted blobs, JPEGs) barely shrinks — a useful tell: high-entropy data that won't compress is often encrypted or already packed.

Security relevance

  • Loot often arrives as archives — backups (.tar.gz), exfil bundles, source dumps. Knowing how to list and extract them is routine.
  • Zip-slip: a malicious archive with paths like ../../etc/cron.d/x can write outside the extraction directory if the extractor doesn't sanitise paths — a real vulnerability class.
  • Zip bombs: tiny archives that expand to enormous sizes, a denial-of-service trick.

Summary

tar archives, gzip/xz/zip compress, and high-entropy data resists compression. Archive handling is everyday work, and extractors must defend against zip-slip path traversal and zip bombs.