Explaining .tar.gz file format

The aim of this pageđź“ť is to explain file compression and archiving based on the particular example of combining tar and gzip in software distribution.

Pavol Kutaj
2 min readApr 4, 2024
  • A tar file (short for Tape Archive) bundles multiple files into a single archive.
  • Initially designed for sequential I/O devices (like magnetic tapes) lacking their own file systems.
  • Retains metadata (permissions, ownership, timestamps).

Why Does Tar Exist?

  • Historical context: Early tape drives and raw disks had variable-length data blocks, leading to wasted space.
  • Sequential I/O devices: Tar suited devices where files were written sequentially.
  • Collecting files: Ideal for bundling related files together.

Advantages of Tar:

  • Simplicity: Easy to use.
  • Cross-Platform: Works across Unix-like systems.
  • No Compression: Focuses on bundling files.
  • Software Distribution: Commonly used for distributing software.

Combining Tar and Gzip:

  • Why?: To get both bundling and compression.
  • Result: The .tar.gz file contains a tar archive compressed with gzip.

Usage:

  • Creating: tar -czvf archive.tar.gz file1 file2 ...
  • Extracting: tar -xzvf archive.tar.gz

ZIP vs. Tar/Gzip:

ZIP:

  • Retains metadata.
  • Efficient for individual files.
  • Widely supported but not ideal for software distribution.

.tar.gz:

  • Preserves metadata.
  • Efficient storage and distribution.
  • Popular in Unix environments.

LINKS

ANKI

Question 1

What does a tar file do?

Answer 1

A tar file bundles multiple files into a single archive, preserving metadata.

Question 2

Why was tar initially designed?

Answer 2

To write data to sequential I/O devices (like magnetic tapes) lacking their own file systems.

Question 3

What does .tar.gz combine?

Answer 3

A tar archive (bundling) with gzip compression.

Question 4

Why is ZIP not ideal for software distribution?

Answer 4

ZIP lacks seamless metadata preservation (Linux permissions) and efficient compression for large software packages.

Question 5

What’s the benefit of .tar.gz in Unix environments?

Answer 5

It combines bundling, metadata preservation, and efficient compression.

--

--

Pavol Kutaj

Today I Learnt | Infrastructure Support Engineer at snowplow.io with a passion for cloud infrastructure/terraform/python/docs. More at https://pavol.kutaj.com