tar

format of tape archive files

TLDR

Create an archive from files

>_ tar cf [target.tar] [file1] [file2] [file3]
copy

Create a gzipped archive

>_ tar czf [target.tar.gz] [file1] [file2] [file3]
copy

Create a gzipped archive from a directory using relative paths

>_ tar czf [target.tar.gz] -C [path/to/directory] .
copy

Extract a (compressed) archive into the current directory

>_ tar xf [source.tar[.gz|.bz2|.xz]]
copy

Extract an archive into a target directory

>_ tar xf [source.tar] -C [directory]
copy

Create a compressed archive, using archive suffix to determine the compression program

>_ tar caf [target.tar.xz] [file1] [file2] [file3]
copy

List the contents of a tar file

>_ tar tvf [source.tar]
copy

Extract files matching a pattern

>_ tar xf [source.tar] --wildcards ["*.html"]
copy

Extract a specific file without preserving the folder structure

>_ tar xf [source.tar] [source.tar/path/to/extract] --strip-components=[depth_to_strip]
copy

DESCRIPTION

The tar archive format collects any number of files, directories, and other file system objects (symbolic links, device nodes, etc.) into a single stream of bytes. The format was originally designed to be used with tape drives that operate with fixed-size blocks, but is widely used as a general packaging mechanism.

General Format

A tar archive consists of a series of 512-byte records. Each file system object requires a header record which stores basic metadata (pathname, owner, permissions, etc.) and zero or more records containing any file data. The end of the archive is indicated by two records consisting entirely of zero bytes.

For compatibility with tape drives that use fixed block sizes, programs that read or write tar files always read or write a fixed number of records with each I/O operation. These ``blocks'' are always a multiple of the record size. The maximum block size supported by early implementations was 10240 bytes or 20 records. This is still the default for most implementations although block sizes of 1MiB (2048 records) or larger are commonly used with modern high-speed tape drives. (Note: the terms ``block'' and ``record'' here are not entirely standard; this document follows the convention established by John Gilmore in documenting pdtar . )

Old-Style Archive Format

The original tar archive format has been extended many times to include additional information that various implementors found necessary. This section describes the variant implemented by the tar command included in Version 7 AT&T UNIX which seems to be the earliest widely-used version of the tar program.

The header record for an old-style tar archive consists of the following: -literal -offset indent struct header_old_tar { char name[100]; char mode[8]; char uid[8]; char gid[8]; char size[12]; char mtime[12]; char checksum[8]; char linkflag[1]; char linkname[100]; char pad[255]; }; All unused bytes in the header record are filled with nulls.

Early tar implementations varied in how they terminated these fields. The tar command in Version 7 AT&T UNIX used the following conventions (this is also documented in early BSD manpages): the pathname must be null-terminated; the mode, uid, and gid fields must end in a space and a null byte; the size and mtime fields must end in a space; the checksum is terminated by a null and a space. Early implementations filled the numeric fields with leading spaces. This seems to have been common practice until the IEEE Std 1003.1-1988 (``POSIX.1'') standard was released. For best portability, modern implementations should fill the numeric fields with leading zeros.

Pre-POSIX Archives

early draft of IEEE Std 1003.1-1988 (``POSIX.1'') served as the basis for John Gilmore's pdtar program and many system implementations from the late 1980s and early 1990s. These archives generally follow the POSIX ustar format described below with the following variations:

POSIX ustar Archives

IEEE Std 1003.1-1988 (``POSIX.1'') defined a standard tar file format to be read and written by compliant implementations of tar(1). This format is often called the ``ustar'' format, after the magic value used in the header. (The name is an acronym for ``Unix Standard TAR . )'' It extends the historic format with new fields: -literal -offset indent struct header_posix_ustar { char name[100]; char mode[8]; char uid[8]; char gid[8]; char size[12]; char mtime[12]; char checksum[8]; char typeflag[1]; char linkname[100]; char magic[6]; char version[2]; char uname[32]; char gname[32]; char devmajor[8]; char devminor[8]; char prefix[155]; char pad[12]; };

Note that all unused bytes must be set to NUL .

Field termination is specified slightly differently by POSIX than by previous implementations. The magic , uname , and gname fields must have a trailing NUL . The pathname , linkname , and prefix fields must have a trailing NUL unless they fill the entire field. (In particular, it is possible to store a 256-character pathname if it happens to have a / as the 156th character.) POSIX requires numeric fields to be zero-padded in the front, and requires them to be terminated with either space or NUL characters.

Currently, most tar implementations comply with the ustar format, occasionally extending it by adding new fields to the blank area at the end of the header record.

Numeric Extensions

There have been several attempts to extend the range of sizes or times supported by modifying how numbers are stored in the header.

One obvious extension to increase the size of files is to eliminate the terminating characters from the various numeric fields. For example, the standard only allows the size field to contain 11 octal digits, reserving the twelfth byte for a trailing NUL character. Allowing 12 octal digits allows file sizes up to 64 GB.

Another extension, utilized by GNU tar, star, and other newer tar implementations, permits binary numbers in the standard numeric fields. This is flagged by setting the high bit of the first byte. The remainder of the field is treated as a signed twos-complement value. This permits 95-bit values for the length and time fields and 63-bit values for the uid, gid, and device numbers. In particular, this provides a consistent way to handle negative time values. GNU tar supports this extension for the length, mtime, ctime, and atime fields. Joerg Schilling's star program and the libarchive library support this extension for all numeric fields. Note that this extension is largely obsoleted by the extended attribute record provided by the pax interchange format.

Another early GNU extension allowed base-64 values rather than octal. This extension was short-lived and is no longer supported by any implementation.

Pax Interchange Format

There are many attributes that cannot be portably stored in a POSIX ustar archive. IEEE Std 1003.1-2001 (``POSIX.1'') defined a ``pax interchange format'' that uses two new types of entries to hold text-formatted metadata that applies to following entries. Note that a pax interchange format archive is a ustar archive in every respect. The new data is stored in ustar-compatible archive entries that use the ``x'' or ``g'' typeflag. In particular, older implementations that do not fully support these extensions will extract the metadata into regular files, where the metadata can be examined as necessary.

entry in a pax interchange format archive consists of one or two standard ustar entries, each with its own header and data. The first optional entry stores the extended attributes for the following entry. This optional first entry has an "x" typeflag and a size field that indicates the total size of the extended attributes. The extended attributes themselves are stored as a series of text-format lines encoded in the portable UTF-8 encoding. Each line consists of a decimal number, a space, a key string, an equals sign, a value string, and a new line. The decimal number indicates the length of the entire line, including the initial length field and the trailing newline. example of such a field is: 25 ctime=1084839148.1212 \ n Keys in all lowercase are standard keys. Vendors can add their own keys by prefixing them with an all uppercase vendor name and a period. Note that, unlike the historic header, numeric values are stored using decimal, not octal. A description of some common keys follows:

Any values stored in an extended attribute override the corresponding values in the regular tar header. Note that compliant readers should ignore the regular fields when they are overridden. This is important, as existing archivers are known to store non-compliant values in the standard header fields in this situation. There are no limits on length for any of these fields. In particular, numeric fields can be arbitrarily large. All text fields are encoded in UTF8. Compliant writers should store only portable 7-bit ASCII characters in the standard ustar header and use extended attributes whenever a text value contains non-ASCII characters.

In addition to the x entry described above, the pax interchange format also supports a g entry. The g entry is identical in format, but specifies attributes that serve as defaults for all subsequent archive entries. The g entry is not widely used.

Besides the new x and g entries, the pax interchange format has a few other minor variations from the earlier ustar format. The most troubling one is that hardlinks are permitted to have data following them. This allows readers to restore any hardlink to a file without having to rewind the archive to find an earlier entry. However, it creates complications for robust readers, as it is no longer clear whether or not they should ignore the size field for hardlink entries.

GNU Tar Archives

The GNU tar program started with a pre-POSIX format similar to that described earlier and has extended it using several different mechanisms: It added new fields to the empty space in the header (some of which was later used by POSIX for conflicting purposes); it allowed the header to be continued over multiple records; and it defined new entries that modify following entries (similar in principle to the x entry described above, but each GNU special entry is single-purpose, unlike the general-purpose x entry). As a result, GNU tar archives are not POSIX compatible, although more lenient POSIX-compliant readers can successfully extract most GNU tar archives. -literal -offset indent struct header_gnu_tar { char name[100]; char mode[8]; char uid[8]; char gid[8]; char size[12]; char mtime[12]; char checksum[8]; char typeflag[1]; char linkname[100]; char magic[6]; char version[2]; char uname[32]; char gname[32]; char devmajor[8]; char devminor[8]; char atime[12]; char ctime[12]; char offset[12]; char longnames[4]; char unused[1]; struct { char offset[12]; char numbytes[12]; } sparse[4]; char isextended[1]; char realsize[12]; char pad[17]; };

GNU tar pax archives

GNU tar 1.14 (XXX check this XXX) and later will write pax interchange format archives when you specify the --posix flag. This format follows the pax interchange format closely, using some SCHILY tags and introducing new keywords to store sparse file information. There have been three iterations of the sparse file support, referred to as ``0.0 ,'' ``0.1 ,'' and ``1.0 .''

Solaris Tar

XXX More Details Needed XXX

Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an ``extended'' format that is fundamentally similar to pax interchange format, with the following differences:

AIX Tar

XXX More details needed XXX

AIX Tar uses a ustar-formatted header with the type A for storing coded ACL information. Unlike the Solaris format, AIX tar writes this header after the regular file body to which it applies. The pathname in this header is either NFS4 or AIXC to indicate the type of ACL stored. The actual ACL is stored in platform-specific binary format.

Mac OS X Tar

The tar distributed with Apple's Mac OS X stores most regular files as two separate files in the tar archive. The two files have the same name except that the first one has ``._'' prepended to the last path element. This special file stores an AppleDouble-encoded binary blob with additional metadata about the second file, including ACL, extended attributes, and resources. To recreate the original file on disk, each separate file can be extracted and the Mac OS X copyfile() function can be used to unpack the separate metadata file and apply it to th regular file. Conversely, the same function provides a ``pack'' option to encode the extended metadata from a file into a separate file whose contents can then be put into a tar archive.

Note that the Apple extended attributes interact badly with long filenames. Since each file is stored with the full name, a separate set of extensions needs to be included in the archive for each one, doubling the overhead required for files with long names.

Summary of tar type codes

The following list is a condensed summary of the type codes used in tar header records generated by different tar implementations. More details about specific implementations can be found above:

STANDARDS

The tar utility is no longer a part of POSIX or the Single Unix Standard. It last appeared in It has been supplanted in subsequent standards by pax(1). The ustar format is currently part of the specification for the pax(1) utility. The pax interchange file format is new with

HISTORY

A tar command appeared in Seventh Edition Unix, which was released in January, 1979. It replaced the tp program from Fourth Edition Unix which in turn replaced the tap program from First Edition Unix. John Gilmore's pdtar public-domain implementation (circa 1987) was highly influential and formed the basis of GNU tar (circa 1988). Joerg Shilling's star archiver is another open-source (CDDL) archiver (originally developed circa 1985) which features complete support for pax interchange format.

This documentation was written as part of the libarchive and bsdtar project by Tim Kientzle <kientzle@FreeBSD.org .>

SEE ALSO

ar(1), pax(1), tar(1)

Copied to clipboard
free 100$ digital ocean credit