Archive formats and utilities

An archive, is a collection of something in a container. In our use the things are files and their associated meta data. The container can be punch tape, floppy disk, tape, or another file.

An archive tool manipulates the container, while the format specifies how data is stored in the container.

The two key early program were CPIO (CoPy In Out) and TAR. CPIO is mostly used in a pipeline. Where it is fed a list of file names on its input, reads their meta data and contents from storage, and outputs them as an archive to its output.

cat MyListOfFile | cpio -ocB > /dev/rmt0

Tar on the other hand, usually takes the name of a file in which to create the archive and a list of directories to include on its command line.

tar -c -f MyArchive.tar ./book1/ ./book2/

CPIO also provided a copy mode where the target is a directory rather than a device, and the list of files presented on its input is copied to the new directory. In this case no archive is created so archive format restrictions do not apply. Limits are those of the system on which it is run.

cat MyListOfFile | cpio -pdum ../NewDirectory 

One major issue these days is format limits. The limit on the size of files to be include, the size of the archive to be created, and what can be included as names. The effective limits are always the lesser of the format limits, the operating system limits, and the resource limits of the devices involved.

Back in the mid 1970s with only a small number of computers counting their total disk storage in Megabytes, many had only 160 KB stored on a floppy. An archive format that allowed for files 2,000 times the size of the then available storage devices was never seen as a constraint.

As the techniques used to make silicon chips were applied to drive heads, the head size shrank, and data density rose. Some of the increase in data density was traded off for smaller drives.

At the data centre end of the market a server in a rack with 100 8 TB drives is perfectly practical.

Archive formats, and the meta data they can store has had to change.

Side note on prefixes indicating size

Computing mostly operated in binary and 210 = 1024, so in the early days people working with computers tended to ignore the 2.4% difference between 1000 and 1024, and borrow the SI prefixes.

Système International d'Unités - SI - International system of units

Back in 1960 the scientific community standardized a consistent set of units, for various physical properties, and a set of prefixes for when you were dealing with large numbers of, or small fractions of the unit.

IEC

By the time you reach Gigabyte quantities the difference between 109, and 230, reaches 7.4%. Drive manufacturers were using GB and TB to mean 109, and 1012. While memory manufacturers were using them for 230, and 240

IEC responded to the growing issue by standardising a new set of prefixes

KiBKibibyte210 Bytes1,024 Bytes
MiBMibibyte220 Bytes1,024 kibibytes
GiBGibibyte230 Bytes1,024 mibibytes
TiBTibibyte240 Bytes1,024 gibibytes
PiBPebibyte250 Bytes1,024 tibibytes
EiBExbibyte260 Bytes1,024 pebibytes
http://physics.nist.gov/cuu/Units/binary.html

History of archive standards.

The POSIX 2 standard in 1992 replaced cpio with pax. Pax keeps the pipeline and copy modes of cpio, but also support ACL's and extended attributes. It introduced its own archive format, with support for much larger files, and a large enough number of user ids, for a system with multiple use accounts for everyone in the world. For backwards compatibility and portability it could read and write in selected cpio and tar formats.

History of standardization attempts.

1988IEEE Std 1003.1-1988, Standard Data Interchange format Compliant systems to include cpio and tar. These utilities must each support the agreed standard format, additional formats optional.
1989-1994GNU wrote their own utility tar, with its own proprietary format. :-(
1992POSIX.2 IEEE Std 1003.2-1992Added the PAX utility with support for cpio and ustar archive formats.
1997Single UNIX Specification version 2 Compliant systems to include pax (Portable Archive eXchange), a single utility that can read and write data, in the cpio and tar formats standardized in 1988. Working party to finalize additional format to support larger files, and systems.
2001POSIX.1-2001 or IEEE Std 1003.1-2001 or Single UNIX Specification version 3 pax (Portable Archive eXchange), additional format (pax) must be supported, that supports very large files. It is a development of the ustar format with additional header types.
2004POSIX.1-2004 or IEEE Std 1003.1-2004
2008POSIX.1-2008 or IEEE Std 1003.1-2008

When the developers wrote there first archive program in 1977, they thought ahead. In 1977 a large computer could have a whole MB of storage, on a platter about the size of an LP. They designed a format that would work on a system with 65,000 users, creating files 2,000 times the size of the largest disk then available.

Tar was developed independently, a couple of years later with broadly similar capabilities. Give a problem to several independent sets of developers, and they will generally come up with similar solutions.

It was mostly the bits that had not been envisaged, that cause issues over the following 20 years.

By the late 1990s we had several dozen propitiatory modifications, for various systems that were mostly incompatible. Archive programs would by default generate archives in the propitiatory format. Which caused lots of problems when you want to access the archive from a different computer to the one that created it. Most had the ability to read, and some times write, one or more of,

When doing this any thing that did not fit the target standard was dropped, or the file creation failed.

The revised POSIX standard in 2001 introduced PAX. As an extensible archive format. The originator may add records to the archive describing addition attributes that had not been thought of in 2001, The receiver can process those attributes it recognises and ignore the rest.

Summary format list. Note codes are used as column headings in the next table. As a general complication, different programs, or in some cases different versions of the same program, use the same format code, for different formats. :-(

CodeYearFormatMaximum
File Size
Maximum
user/group id
Maximum
Pathname length
dev_t size
bits major/minor
ACLExtended Atrib
SeLinux
bin1977Original CPIO binary format, developed by AT&T2 GB6553525616NoNo
tar?Original tar format2 GB65535155/100?16?NoNo
odc1980Original CPIO ascii format, introduced with system III, IEEE 1003.1, universally supported8 GB26214325618NoNo
uv7tar1980Edition 7 of tar utility. 19808 GB2,097,15199N/ANoNo
star1985modified tar format, used by star.8 GB2,097,151256(99)24/24NoNo
cpio1988POSIX cpio standard 8 GB26214326214318NoNo
ustar1988?IEEE/POSIX1003/IEC-9945-1-1988 Standard Data Interchange format.8 GB2,097,151256(99)21/21NoNo
svr4?CPIO ascii4 GB4.3e9 (232-1)102432/32
crc?CPIO ascii + checksum4 GB4.3e9 (232-1)102432/32
dec?odc modified by DEC8 GB262,14325624/24
gnutar1989Format used by PD tar/GNU tar <=1.12????NoNo
bar?8 GB2,097,15142721
sgi?bin modified by SGI9 EB65,53525614/18
sco?svr4 modified by SCO9 EB4.3e9 (232-1)102432/32
scocrc?crc modified by SCO9 EB4.3e9 (232-1)102432/32
zip1990?9 EB4.3e9 (232-1)6000032
cray1993?bin modified by cray, all fields 64 bit.9 EB1.8e19 (264-1)6553564
xstar1994Extended version of star9 EB1.8e19 (264-1)6553521/21
sun1997?ustar extended by SUN9 EB1.8e19 (264-1)6553563/63
xustar1998xstar without tar signature.9 EB1.8e19 (264-1)6553521/21
gnu1999?uv7tar? extended by GNU 1.139 EB1.8e19 (264-1)6553563/63NoNo
pax2001?POSIX-1003.1-2001 Standard Data Interchange format. (extended version of sun/ustar?)9 EB1.8e19 (264-1)6553521/21Yes?
?2004GNU tar 1.159 EB1.8e19 (264-1)6553563/63YesYes

OS tool Compatibility and support table.

This is not the whole story. With regard to some formats notable the POSIX standard from 2001, the standard provided an extension mechanism for additional attributes or meta data. Currently this includes access control lists, extended attributes, SeLinux contexts, ...

Just because a particular tool can read or write a format does not mean it supports, all of the extensions allowed in the format. When creating an archive a tool can be asked to write those attributes it knows about to the archive. When restoring they mainly just ignore any attributes recorded in the archive that they do not understand.

For transferring or archiving data files, extended attributes are unlikely to matter very much. However for backing up an operating system, that you expect to be able to restore and run they do.

OSRepoPathVersionbinodcsvr4crcuv7tarustarxstarpaxsungnutarzip
AIX 3.x ?/usr/bin/cpio?default-c
AIX 4.2 ?/usr/bin/cpio?default-c
?/usr/bin/pax? -x cpio -x ustar
?/usr/bin/tar?According to manual old <2GB tar format only
AIX 4.3 ?/usr/bin/cpio?default-c
?/usr/bin/pax? -x cpio -x ustar -x pax
?/usr/bin/tar?According to manual old <2GB tar format only
AIX 5.x ?/usr/bin/cpio?default-c
?/usr/sysv/bin/cpio? -c
-Hodc
 -Hcrc-Htar-Hustar
?/usr/bin/pax? -x cpio -x ustar -x pax
?/usr/bin/tar? default
OSRepoPathVersionbinodcsvr4crcuv7tarustarxstarpaxsungnutarzip
Fedora3 base/bin/cpio2.5-H bin-H odc-c
-H newc
-H crc-H tar-H ustar
base/usr/bin/pax3.0-x bcpio-x cpio -x sv4crc-x tar-x ustar
base/usr/bin/star1.5artype=binartype=odc artype=v7tarartype=ustardefaultartype=paxartype=suntarartype=gnutar
base/bin/tar1.14     -o
base/usr/bin/ustar1.5artype=binartype=odc artype=v7tardefaultartype=xstarartype=paxartype=suntarartype=gnutar
base/usr/bin/zip2.3 default
CentOS 5.x ?/bin/cpio2.6-H bin-H odc-c
-H newc
-H crc-H tar-H ustar
base/bin/gtar1.15 --format v7--format ustar --format posix --format oldgnu
?/usr/bin/pax3.4-x bcpio-x cpio -x sv4crc-x tar-x ustar
?/usr/bin/spax? -x cpio -x ustar -x pax
base/usr/bin/star1.5artype=binartype=odc artype=v7tarartype=ustardefaultartype=paxartype=suntarartype=gnutar
base/usr/bin/ustar1.5artype=binartype=odc artype=v7tardefaultartype=xstarartype=paxartype=suntarartype=gnutar
CentOS 6.x base/bin/cpio2.11-H bin-H odc-c
-H newc
-H crc-H tar-H ustar
base/usr/bin/pax3.4-x bcpio-x cpio-x sv4cpio-x sv4crc-x tar-x ustar
base/usr/bin/spax1.5 -x cpio -x ustar -x pax
base/bin/gtar1.23 -H v7-H ustar -H pax
-H posix
 -H gnu
-H oldgnu
base/usr/bin/star1.5artype=binartype=odc artype=v7tarartype=ustardefaultartype=paxartype=suntarartype=gnutar
base/usr/bin/ustar1.5artype=binartype=odc artype=v7tardefaultartype=xstarartype=paxartype=suntarartype=gnutar
OSRepoPathVersionbinodcsvr4crcuv7tarustarxstarpaxsungnutarzip
CentOS 7.x base/usr/bin/bsdcpio3.1.2 default
-c
-H odc
-H cpio

-H newc
 -H ustar -H pax
base/bin/cpio2.11-H bin-H odc-c
-H newc
-H crc-H tar-H ustar
base/usr/bin/pax3.4-x bcpio-x cpio-x sv4cpio-x sv4crc-x tar-x ustar
base/usr/bin/opax3.4-x bcpio-x cpio-x sv4cpio-x sv4crc-x tar-x ustar
base/usr/bin/spax1.5 -x cpio -x ustar -x pax
base/bin/gtar1.26 -H v7-H ustar -H pax
-H posix
 -H gnu
-H oldgnu
base/usr/bin/star1.5artype=binartype=odc artype=v7tarartype=ustardefaultartype=paxartype=suntarartype=gnutar
base/usr/bin/ustar1.5artype=binartype=odc artype=v7tardefaultartype=xstarartype=paxartype=suntarartype=gnutar
OSRepoPathVersionbinodcsvr4crcuv7tarustarxstarpaxsungnutarzip
Fedora 17 fedora/usr/bin/bsdcpio3.1.4 default
-c
-H odc
-H cpio

-H newc
 -H ustar -H pax
fedora/bin/cpio2.11-H bin-H odc-c
-H newc
-H crc-H tar-H ustar
base/usr/bin/pax3.4-x bcpio-x cpio-x sv4cpio-x sv4crc-x tar-x ustar
base/usr/bin/spax1.5.1 -x cpio -x ustar -x pax
base/bin/gtar1.26 -H v7-H ustar -H pax
-H posix
 -H gnu
-H oldgnu
base/usr/bin/star1.5.1artype=binartype=odc artype=v7tarartype=ustardefaultartype=paxartype=suntarartype=gnutar
base/usr/bin/ustar1.5.1artype=binartype=odc artype=v7tardefaultartype=xstarartype=paxartype=suntarartype=gnutar
base/usr/bin/zip3.0 default
Fedora 24 base/usr/bin/bsdcpio3.2.1 default
-c
-H odc
-H cpio

-H newc
 -H ustar -H pax
base/bin/cpio2.12.3-H bin-H odc-c
-H newc
-H crc-H tar-H ustar
base/usr/bin/pax3.4.23-x bcpio-x cpio-x sv4cpio-x sv4crc-x tar-x ustar
base/usr/bin/opax3.4.23-x bcpio-x cpio-x sv4cpio-x sv4crc-x tar-x ustar
base/usr/bin/spax1.5.3 -x cpio -x ustar -x pax
base/bin/gtar1.28 -H v7-H ustar -H pax
-H posix
 -H gnu
-H oldgnu
base/usr/bin/star1.5.3artype=binartype=odc artype=v7tarartype=ustardefaultartype=paxartype=suntarartype=gnutar
base/usr/bin/ustar1.5.3artype=binartype=odc artype=v7tardefaultartype=xstarartype=paxartype=suntarartype=gnutar
OSRepoPathVersionbinodcsvr4crcuv7tarustarxstarpaxsungnutarzip
CentOS 8.x baseos/bin/cpio2.12-H bin-H odc-c
-H newc
-H crc-H tar-H ustar
epel/usr/bin/pax1.5.3-x bcpio-x cpio-x sv4cpio-x sv4crc-x tar-x ustar
epel/usr/bin/opax3.4-x bcpio-x cpio-x sv4cpio-x sv4crc-x tar-x ustar
baseos/usr/bin/spax1.5.3 -x cpio -x ustar -x pax
baseos/bin/gtar1.30 -H v7-H ustar -H pax
-H posix
 -H gnu
-H oldgnu
baseos/usr/bin/star1.5.3artype=binartype=odc artype=v7tarartype=ustardefaultartype=paxartype=suntarartype=gnutar
baseos/usr/bin/ustar1.5.3artype=binartype=odc artype=v7tardefaultartype=xstarartype=paxartype=suntarartype=gnutar
baseos/usr/bin/bsdtar3.3.3 

Applications

Gnu tar


yum install tar

Feature set, option names, and archive format have changed a lot over the years.

1999 version 1.13 new archive format

2003 version 1.14 new archive formats, support for sparce files

2004 version 1.15 new archive format, supporting acl, xattr, selinux

2007 version 1.17 rework option names....

versiondateOSDefault formatoldgnuv7ustargnupaxSparseaclsxattrsselinux
1.11? oldgnu
1.1225/11/1998 old
portability.
=oldarchive
?25/4/1997
1.13.251999RedHat 7.3
RedHat 9.0
SUSE 9.1
gnu
?13/11/2003 ?-H oldgnu-H v7-H ustar
1.1410/5/2004Fedora 3
SUSE 9.2
?-o--sparseN/AN/AN/A
1.15.121/12/2004Centos 5.x
Fedora 5
Fedora 6
Fedora 7
gnu--format oldgnu-o
--old-archive
--portability
--format v7
--format ustar--format gnu--format posix
--posix
--sparse--acls--xattrs--selinux
1.178/6/2007Fedora 8?-H oldgnu-H v7-H ustar-H gnu-H pax
-H posix
1.1910/10/2009Fedora 9?-H oldgnu-H v7-H ustar-H gnu-H pax
-H posix
1.20?Fedora 10
Debian {CoLinux}?
?-H oldgnu-H v7-H ustar-H gnu-H pax
-H posix
1.22?Fedora 12?-H oldgnu-H v7-H ustar-H gnu-H pax
-H posix
1.2310/3/2010Centos 6.xgnu-H oldgnu-H v7-H ustar-H gnu-H pax
-H posix
--sparse--acls--xattrs--selinux
1.2612/3/2011Fedora 16
Fedora 17
gnu-H oldgnu-H v7-H ustar-H gnu-H pax
-H posix
--sparse--acls--xattrs--selinux

star (ustar)

The package star provides the commands /usr/bin/star, and /usr/bin/ustar


yum install star

The star command suports creating and reading a wide range of archive formats. In all cases the restrictions of the format applies.

When invoked as ustar the default output format is ustar, the restrictions of that format apply. So maximum file size 8 GB, no ACLs, or extended attributes.

When invoked as star the default output format is xstar. So maximum file size 8 GB, no ACLs, or extended attributes.

Specifying pax as the output format gives a file that complies with tha standard with out any extended headers.

Sparse files are not supported in tar, ustar, suntar, pax, or any cpio variant. I think that leaves star, gnutar, xstart, xustar, and exustar.

ACL access control lists are only supported for the format exustar.

Extended attributes are only supported when the format is exustar.

versiondateOSDefault formatoldgnuv7ustargnupaxSparseaclsxattrsselinux
1.52008?CentOS 5.x
1.52008?CentOS 6.x
1.5.2-52008?Fedora 19
1.5.2-132008?CentOS 7.xxstar

spax (pax)

The package spax provides the commands /usr/bin/spax, and /usr/bin/pax


yum install star
versiondateOSDefault formatcpioustarpaxSparseaclsxattrsselinux
1.52008?CentOS 5.x
1.52008?CentOS 6.x
1.5.2-52008?Fedora 19
1.5.2-132008?CentOS 7.xpax-x cpio-x ustar-x pax

opax

Provides the commands /usr/bin/opax, and /usr/bin/pax.


yum install pax

Confusingly this version of the pax command does to support the 2001 stadardized Portable Archive eXchange format (PAX). It does however support additional formats

Port from BSD to SuSE release 2001 as version 3.0, 3.4 came in 2005

versiondateOSDefault formatcpioustargnupax
3.4-22008?CentOS 5.x
3.4-102008?CentOS 6.x
3.4-162008?Fedora 19
3.4-191994?CentOS 7.xustar-x cpio-x ustarN/AN/A

bsdtar


yum install bsdtar
versiondateOSDefault formatoldgnuv7ustargnupaxSparseaclsxattrsselinux
3.1.2-10?CentOS 7.x