An archive, is a collection of something in a container. In our use the things are files and their associated meta data. The container can be punch tape, floppy disk, tape, or another file.
An archive tool manipulates the container, while the format specifies how data is stored in the container.
The two key early program were CPIO (CoPy In Out) and TAR. CPIO is mostly used in a pipeline. Where it is fed a list of file names on its input, reads their meta data and contents from storage, and outputs them as an archive to its output.
|
Tar on the other hand, usually takes the name of a file in which to create the archive and a list of directories to include on its command line.
|
CPIO also provided a copy mode where the target is a directory rather than a device, and the list of files presented on its input is copied to the new directory. In this case no archive is created so archive format restrictions do not apply. Limits are those of the system on which it is run.
|
One major issue these days is format limits. The limit on the size of files to be include, the size of the archive to be created, and what can be included as names. The effective limits are always the lesser of the format limits, the operating system limits, and the resource limits of the devices involved.
Back in the mid 1970s with only a small number of computers counting their total disk storage in Megabytes, many had only 160 KB stored on a floppy. An archive format that allowed for files 2,000 times the size of the then available storage devices was never seen as a constraint.
As the techniques used to make silicon chips were applied to drive heads, the head size shrank, and data density rose. Some of the increase in data density was traded off for smaller drives.
At the data centre end of the market a server in a rack with 100 8 TB drives is perfectly practical.
Archive formats, and the meta data they can store has had to change.
Computing mostly operated in binary and 210 = 1024, so in the early days people working with computers tended to ignore the 2.4% difference between 1000 and 1024, and borrow the SI prefixes.
Back in 1960 the scientific community standardized a consistent set of units, for various physical properties, and a set of prefixes for when you were dealing with large numbers of, or small fractions of the unit.
By the time you reach Gigabyte quantities the difference between 109, and 230, reaches 7.4%. Drive manufacturers were using GB and TB to mean 109, and 1012. While memory manufacturers were using them for 230, and 240
IEC responded to the growing issue by standardising a new set of prefixes
KiB | Kibibyte | 210 Bytes | 1,024 Bytes |
MiB | Mibibyte | 220 Bytes | 1,024 kibibytes |
GiB | Gibibyte | 230 Bytes | 1,024 mibibytes |
TiB | Tibibyte | 240 Bytes | 1,024 gibibytes |
PiB | Pebibyte | 250 Bytes | 1,024 tibibytes |
EiB | Exbibyte | 260 Bytes | 1,024 pebibytes |
The POSIX 2 standard in 1992 replaced cpio with pax. Pax keeps the pipeline and copy modes of cpio, but also support ACL's and extended attributes. It introduced its own archive format, with support for much larger files, and a large enough number of user ids, for a system with multiple use accounts for everyone in the world. For backwards compatibility and portability it could read and write in selected cpio and tar formats.
History of standardization attempts.
1988 | IEEE Std 1003.1-1988, Standard Data Interchange format | Compliant systems to include cpio and tar. These utilities must each support the agreed standard format, additional formats optional. |
1989-1994 | GNU wrote their own utility tar, with its own proprietary format. :-( | |
1992 | POSIX.2 IEEE Std 1003.2-1992 | Added the PAX utility with support for cpio and ustar archive formats. |
1997 | Single UNIX Specification version 2 | Compliant systems to include pax (Portable Archive eXchange), a single utility that can read and write data, in the cpio and tar formats standardized in 1988. Working party to finalize additional format to support larger files, and systems. |
2001 | POSIX.1-2001 or IEEE Std 1003.1-2001 or Single UNIX Specification version 3 | pax (Portable Archive eXchange), additional format (pax) must be supported, that supports very large files. It is a development of the ustar format with additional header types. |
2004 | POSIX.1-2004 or IEEE Std 1003.1-2004 | |
2008 | POSIX.1-2008 or IEEE Std 1003.1-2008 |
When the developers wrote there first archive program in 1977, they thought ahead. In 1977 a large computer could have a whole MB of storage, on a platter about the size of an LP. They designed a format that would work on a system with 65,000 users, creating files 2,000 times the size of the largest disk then available.
Tar was developed independently, a couple of years later with broadly similar capabilities. Give a problem to several independent sets of developers, and they will generally come up with similar solutions.
It was mostly the bits that had not been envisaged, that cause issues over the following 20 years.
By the late 1990s we had several dozen propitiatory modifications, for various systems that were mostly incompatible. Archive programs would by default generate archives in the propitiatory format. Which caused lots of problems when you want to access the archive from a different computer to the one that created it. Most had the ability to read, and some times write, one or more of,
When doing this any thing that did not fit the target standard was dropped, or the file creation failed.
The revised POSIX standard in 2001 introduced PAX. As an extensible archive format. The originator may add records to the archive describing addition attributes that had not been thought of in 2001, The receiver can process those attributes it recognises and ignore the rest.
Summary format list. Note codes are used as column headings in the next table. As a general complication, different programs, or in some cases different versions of the same program, use the same format code, for different formats. :-(
Code | Year | Format | Maximum File Size | Maximum user/group id | Maximum Pathname length | dev_t size bits major/minor | ACL | Extended Atrib SeLinux |
---|---|---|---|---|---|---|---|---|
bin | 1977 | Original CPIO binary format, developed by AT&T | 2 GB | 65535 | 256 | 16 | No | No |
tar | ? | Original tar format | 2 GB | 65535 | 155/100? | 16? | No | No |
odc | 1980 | Original CPIO ascii format, introduced with system III, IEEE 1003.1, universally supported | 8 GB | 262143 | 256 | 18 | No | No |
uv7tar | 1980 | Edition 7 of tar utility. 1980 | 8 GB | 2,097,151 | 99 | N/A | No | No |
star | 1985 | modified tar format, used by star. | 8 GB | 2,097,151 | 256(99) | 24/24 | No | No |
cpio | 1988 | POSIX cpio standard | 8 GB | 262143 | 262143 | 18 | No | No |
ustar | 1988? | IEEE/POSIX1003/IEC-9945-1-1988 Standard Data Interchange format. | 8 GB | 2,097,151 | 256(99) | 21/21 | No | No |
svr4 | ? | CPIO ascii | 4 GB | 4.3e9 (232-1) | 1024 | 32/32 | ||
crc | ? | CPIO ascii + checksum | 4 GB | 4.3e9 (232-1) | 1024 | 32/32 | ||
dec | ? | odc modified by DEC | 8 GB | 262,143 | 256 | 24/24 | ||
gnutar | 1989 | Format used by PD tar/GNU tar <=1.12 | ? | ? | ? | ? | No | No |
bar | ? | 8 GB | 2,097,151 | 427 | 21 | |||
sgi | ? | bin modified by SGI | 9 EB | 65,535 | 256 | 14/18 | ||
sco | ? | svr4 modified by SCO | 9 EB | 4.3e9 (232-1) | 1024 | 32/32 | ||
scocrc | ? | crc modified by SCO | 9 EB | 4.3e9 (232-1) | 1024 | 32/32 | ||
zip | 1990? | 9 EB | 4.3e9 (232-1) | 60000 | 32 | |||
cray | 1993? | bin modified by cray, all fields 64 bit. | 9 EB | 1.8e19 (264-1) | 65535 | 64 | ||
xstar | 1994 | Extended version of star | 9 EB | 1.8e19 (264-1) | 65535 | 21/21 | ||
sun | 1997? | ustar extended by SUN | 9 EB | 1.8e19 (264-1) | 65535 | 63/63 | ||
xustar | 1998 | xstar without tar signature. | 9 EB | 1.8e19 (264-1) | 65535 | 21/21 | ||
gnu | 1999? | uv7tar? extended by GNU 1.13 | 9 EB | 1.8e19 (264-1) | 65535 | 63/63 | No | No |
pax | 2001? | POSIX-1003.1-2001 Standard Data Interchange format. (extended version of sun/ustar?) | 9 EB | 1.8e19 (264-1) | 65535 | 21/21 | Yes | ? |
? | 2004 | GNU tar 1.15 | 9 EB | 1.8e19 (264-1) | 65535 | 63/63 | Yes | Yes |
This is not the whole story. With regard to some formats notable the POSIX standard from 2001, the standard provided an extension mechanism for additional attributes or meta data. Currently this includes access control lists, extended attributes, SeLinux contexts, ...
Just because a particular tool can read or write a format does not mean it supports, all of the extensions allowed in the format. When creating an archive a tool can be asked to write those attributes it knows about to the archive. When restoring they mainly just ignore any attributes recorded in the archive that they do not understand.
For transferring or archiving data files, extended attributes are unlikely to matter very much. However for backing up an operating system, that you expect to be able to restore and run they do.
OS | Repo | Path | Version | bin | odc | svr4 | crc | uv7tar | ustar | xstar | pax | sun | gnutar | zip |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AIX 3.x | ? | /usr/bin/cpio | ? | default | -c | |||||||||
AIX 4.2 | ? | /usr/bin/cpio | ? | default | -c | |||||||||
? | /usr/bin/pax | ? | -x cpio | -x ustar | ? | /usr/bin/tar | ? | According to manual old <2GB tar format only | ||||||
AIX 4.3 | ? | /usr/bin/cpio | ? | default | -c | |||||||||
? | /usr/bin/pax | ? | -x cpio | -x ustar | -x pax | ? | /usr/bin/tar | ? | According to manual old <2GB tar format only | |||||
AIX 5.x | ? | /usr/bin/cpio | ? | default | -c | |||||||||
? | /usr/sysv/bin/cpio | ? | -c -Hodc | -Hcrc | -Htar | -Hustar | ||||||||
? | /usr/bin/pax | ? | -x cpio | -x ustar | -x pax | ? | /usr/bin/tar | ? | default | |||||
OS | Repo | Path | Version | bin | odc | svr4 | crc | uv7tar | ustar | xstar | pax | sun | gnutar | zip |
Fedora3 | base | /bin/cpio | 2.5 | -H bin | -H odc | -c -H newc | -H crc | -H tar | -H ustar | |||||
base | /usr/bin/pax | 3.0 | -x bcpio | -x cpio | -x sv4crc | -x tar | -x ustar | |||||||
base | /usr/bin/star | 1.5 | artype=bin | artype=odc | artype=v7tar | artype=ustar | default | artype=pax | artype=suntar | artype=gnutar | ||||
base | /bin/tar | 1.14 | -o | |||||||||||
base | /usr/bin/ustar | 1.5 | artype=bin | artype=odc | artype=v7tar | default | artype=xstar | artype=pax | artype=suntar | artype=gnutar | ||||
base | /usr/bin/zip | 2.3 | default | |||||||||||
CentOS 5.x | ? | /bin/cpio | 2.6 | -H bin | -H odc | -c -H newc | -H crc | -H tar | -H ustar | |||||
base | /bin/gtar | 1.15 | --format v7 | --format ustar | --format posix | --format oldgnu | ||||||||
? | /usr/bin/pax | 3.4 | -x bcpio | -x cpio | -x sv4crc | -x tar | -x ustar | |||||||
? | /usr/bin/spax | ? | -x cpio | -x ustar | -x pax | |||||||||
base | /usr/bin/star | 1.5 | artype=bin | artype=odc | artype=v7tar | artype=ustar | default | artype=pax | artype=suntar | artype=gnutar | ||||
base | /usr/bin/ustar | 1.5 | artype=bin | artype=odc | artype=v7tar | default | artype=xstar | artype=pax | artype=suntar | artype=gnutar | ||||
CentOS 6.x | base | /bin/cpio | 2.11 | -H bin | -H odc | -c -H newc | -H crc | -H tar | -H ustar | |||||
base | /usr/bin/pax | 3.4 | -x bcpio | -x cpio | -x sv4cpio | -x sv4crc | -x tar | -x ustar | ||||||
base | /usr/bin/spax | 1.5 | -x cpio | -x ustar | -x pax | |||||||||
base | /bin/gtar | 1.23 | -H v7 | -H ustar | -H pax -H posix | -H gnu -H oldgnu | ||||||||
base | /usr/bin/star | 1.5 | artype=bin | artype=odc | artype=v7tar | artype=ustar | default | artype=pax | artype=suntar | artype=gnutar | ||||
base | /usr/bin/ustar | 1.5 | artype=bin | artype=odc | artype=v7tar | default | artype=xstar | artype=pax | artype=suntar | artype=gnutar | ||||
OS | Repo | Path | Version | bin | odc | svr4 | crc | uv7tar | ustar | xstar | pax | sun | gnutar | zip |
CentOS 7.x | base | /usr/bin/bsdcpio | 3.1.2 | default -c -H odc -H cpio | -H newc | -H ustar | -H pax | base | /bin/cpio | 2.11 | -H bin | -H odc | -c -H newc | -H crc | -H tar | -H ustar |
base | /usr/bin/pax | 3.4 | -x bcpio | -x cpio | -x sv4cpio | -x sv4crc | -x tar | -x ustar | ||||||
base | /usr/bin/opax | 3.4 | -x bcpio | -x cpio | -x sv4cpio | -x sv4crc | -x tar | -x ustar | ||||||
base | /usr/bin/spax | 1.5 | -x cpio | -x ustar | -x pax | |||||||||
base | /bin/gtar | 1.26 | -H v7 | -H ustar | -H pax -H posix | -H gnu -H oldgnu | ||||||||
base | /usr/bin/star | 1.5 | artype=bin | artype=odc | artype=v7tar | artype=ustar | default | artype=pax | artype=suntar | artype=gnutar | ||||
base | /usr/bin/ustar | 1.5 | artype=bin | artype=odc | artype=v7tar | default | artype=xstar | artype=pax | artype=suntar | artype=gnutar | ||||
OS | Repo | Path | Version | bin | odc | svr4 | crc | uv7tar | ustar | xstar | pax | sun | gnutar | zip |
Fedora 17 | fedora | /usr/bin/bsdcpio | 3.1.4 | default -c -H odc -H cpio | -H newc | -H ustar | -H pax | fedora | /bin/cpio | 2.11 | -H bin | -H odc | -c -H newc | -H crc | -H tar | -H ustar |
base | /usr/bin/pax | 3.4 | -x bcpio | -x cpio | -x sv4cpio | -x sv4crc | -x tar | -x ustar | ||||||
base | /usr/bin/spax | 1.5.1 | -x cpio | -x ustar | -x pax | |||||||||
base | /bin/gtar | 1.26 | -H v7 | -H ustar | -H pax -H posix | -H gnu -H oldgnu | ||||||||
base | /usr/bin/star | 1.5.1 | artype=bin | artype=odc | artype=v7tar | artype=ustar | default | artype=pax | artype=suntar | artype=gnutar | ||||
base | /usr/bin/ustar | 1.5.1 | artype=bin | artype=odc | artype=v7tar | default | artype=xstar | artype=pax | artype=suntar | artype=gnutar | ||||
base | /usr/bin/zip | 3.0 | default | |||||||||||
Fedora 24 | base | /usr/bin/bsdcpio | 3.2.1 | default -c -H odc -H cpio | -H newc | -H ustar | -H pax | |||||||
base | /bin/cpio | 2.12.3 | -H bin | -H odc | -c -H newc | -H crc | -H tar | -H ustar | ||||||
base | /usr/bin/pax | 3.4.23 | -x bcpio | -x cpio | -x sv4cpio | -x sv4crc | -x tar | -x ustar | ||||||
base | /usr/bin/opax | 3.4.23 | -x bcpio | -x cpio | -x sv4cpio | -x sv4crc | -x tar | -x ustar | ||||||
base | /usr/bin/spax | 1.5.3 | -x cpio | -x ustar | -x pax | |||||||||
base | /bin/gtar | 1.28 | -H v7 | -H ustar | -H pax -H posix | -H gnu -H oldgnu | ||||||||
base | /usr/bin/star | 1.5.3 | artype=bin | artype=odc | artype=v7tar | artype=ustar | default | artype=pax | artype=suntar | artype=gnutar | ||||
base | /usr/bin/ustar | 1.5.3 | artype=bin | artype=odc | artype=v7tar | default | artype=xstar | artype=pax | artype=suntar | artype=gnutar | ||||
OS | Repo | Path | Version | bin | odc | svr4 | crc | uv7tar | ustar | xstar | pax | sun | gnutar | zip |
CentOS 8.x | baseos | /bin/cpio | 2.12 | -H bin | -H odc | -c -H newc | -H crc | -H tar | -H ustar | |||||
epel | /usr/bin/pax | 1.5.3 | -x bcpio | -x cpio | -x sv4cpio | -x sv4crc | -x tar | -x ustar | ||||||
epel | /usr/bin/opax | 3.4 | -x bcpio | -x cpio | -x sv4cpio | -x sv4crc | -x tar | -x ustar | ||||||
baseos | /usr/bin/spax | 1.5.3 | -x cpio | -x ustar | -x pax | |||||||||
baseos | /bin/gtar | 1.30 | -H v7 | -H ustar | -H pax -H posix | -H gnu -H oldgnu | ||||||||
baseos | /usr/bin/star | 1.5.3 | artype=bin | artype=odc | artype=v7tar | artype=ustar | default | artype=pax | artype=suntar | artype=gnutar | ||||
baseos | /usr/bin/ustar | 1.5.3 | artype=bin | artype=odc | artype=v7tar | default | artype=xstar | artype=pax | artype=suntar | artype=gnutar | ||||
baseos | /usr/bin/bsdtar | 3.3.3 |
|
Feature set, option names, and archive format have changed a lot over the years.
1999 version 1.13 new archive format
2003 version 1.14 new archive formats, support for sparce files
2004 version 1.15 new archive format, supporting acl, xattr, selinux
2007 version 1.17 rework option names....
version | date | OS | Default format | oldgnu | v7 | ustar | gnu | pax | Sparse | acls | xattrs | selinux |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1.11 | ? | oldgnu | ||||||||||
1.12 | 25/11/1998 | old portability. =oldarchive | ||||||||||
? | 25/4/1997 | |||||||||||
1.13.25 | 1999 | RedHat 7.3 RedHat 9.0 SUSE 9.1 | gnu | |||||||||
? | 13/11/2003 | ? | -H oldgnu | -H v7 | -H ustar | |||||||
1.14 | 10/5/2004 | Fedora 3 SUSE 9.2 | ? | -o | --sparse | N/A | N/A | N/A | ||||
1.15.1 | 21/12/2004 | Centos 5.x Fedora 5 Fedora 6 Fedora 7 | gnu | --format oldgnu | -o --old-archive --portability --format v7 | --format ustar | --format gnu | --format posix --posix | --sparse | --acls | --xattrs | --selinux |
1.17 | 8/6/2007 | Fedora 8 | ? | -H oldgnu | -H v7 | -H ustar | -H gnu | -H pax -H posix | ||||
1.19 | 10/10/2009 | Fedora 9 | ? | -H oldgnu | -H v7 | -H ustar | -H gnu | -H pax -H posix | ||||
1.20 | ? | Fedora 10 Debian {CoLinux}? | ? | -H oldgnu | -H v7 | -H ustar | -H gnu | -H pax -H posix | ||||
1.22 | ? | Fedora 12 | ? | -H oldgnu | -H v7 | -H ustar | -H gnu | -H pax -H posix | ||||
1.23 | 10/3/2010 | Centos 6.x | gnu | -H oldgnu | -H v7 | -H ustar | -H gnu | -H pax -H posix | --sparse | --acls | --xattrs | --selinux |
1.26 | 12/3/2011 | Fedora 16 Fedora 17 | gnu | -H oldgnu | -H v7 | -H ustar | -H gnu | -H pax -H posix | --sparse | --acls | --xattrs | --selinux |
The package star provides the commands /usr/bin/star, and /usr/bin/ustar
|
The star command suports creating and reading a wide range of archive formats. In all cases the restrictions of the format applies.
When invoked as ustar the default output format is ustar, the restrictions of that format apply. So maximum file size 8 GB, no ACLs, or extended attributes.
When invoked as star the default output format is xstar. So maximum file size 8 GB, no ACLs, or extended attributes.
Specifying pax as the output format gives a file that complies with tha standard with out any extended headers.
Sparse files are not supported in tar, ustar, suntar, pax, or any cpio variant. I think that leaves star, gnutar, xstart, xustar, and exustar.
ACL access control lists are only supported for the format exustar.
Extended attributes are only supported when the format is exustar.
version | date | OS | Default format | oldgnu | v7 | ustar | gnu | pax | Sparse | acls | xattrs | selinux |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1.5 | 2008? | CentOS 5.x | ||||||||||
1.5 | 2008? | CentOS 6.x | ||||||||||
1.5.2-5 | 2008? | Fedora 19 | ||||||||||
1.5.2-13 | 2008? | CentOS 7.x | xstar |
The package spax provides the commands /usr/bin/spax, and /usr/bin/pax
|
version | date | OS | Default format | cpio | ustar | pax | Sparse | acls | xattrs | selinux |
---|---|---|---|---|---|---|---|---|---|---|
1.5 | 2008? | CentOS 5.x | ||||||||
1.5 | 2008? | CentOS 6.x | ||||||||
1.5.2-5 | 2008? | Fedora 19 | ||||||||
1.5.2-13 | 2008? | CentOS 7.x | pax | -x cpio | -x ustar | -x pax |
Provides the commands /usr/bin/opax, and /usr/bin/pax.
|
Confusingly this version of the pax command does to support the 2001 stadardized Portable Archive eXchange format (PAX). It does however support additional formats
Port from BSD to SuSE release 2001 as version 3.0, 3.4 came in 2005
version | date | OS | Default format | cpio | ustar | gnu | pax |
---|---|---|---|---|---|---|---|
3.4-2 | 2008? | CentOS 5.x | |||||
3.4-10 | 2008? | CentOS 6.x | |||||
3.4-16 | 2008? | Fedora 19 | |||||
3.4-19 | 1994? | CentOS 7.x | ustar | -x cpio | -x ustar | N/A | N/A |
|
version | date | OS | Default format | oldgnu | v7 | ustar | gnu | pax | Sparse | acls | xattrs | selinux |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3.1.2-10 | ? | CentOS 7.x |