What is UNIX

Adding Large File Support to the Single UNIX® Specification

A White Paper from the X/Open Base Working Group.

Abstract

This paper is an abridged version of the submission received by X/Open from the Large File Summit, an industry initiative to produce a common specification for support of files that are bigger than the current limit of 2GB on existing 32-bit systems. It details the modifications to X/Open's Single UNIX Specification to support large files with unlimited file offsets. These changes have been incorporated into the next issue of the Single UNIX Specification.

This document is based on the 20Mar96 Large File Summit Submission. It has been abridged to refer only to the set of changes to the Single UNIX Specification.

Last Update: 14Aug96


Table of Contents

Adding Support for Arbitrary File Sizes to the Single UNIX Specification
1.0 Overview
1.1 The Large File Problem
2.0 Changes to the Single UNIX Specification
2.1 Changes to CAE Specification System Interface Definitions, Issue 4, Version 2
2.2 Changes to CAE Specification System Interfaces and Headers, Issue 4, Version 2
2.2.1 Changes to System Interfaces
2.2.2 Changes to Headers
2.3 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2
Appendix A: Rationale and Notes
A.1 Overview
A.2 Changes to the Single UNIX Specification
A.2.1 Changes to CAE Specification System Interfaces and Headers, Issue 4, Version 2
A.2.1.1 Changes to System Interfaces
A.2.2 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2

Acknowledgements


Acknowledgements

X/Open gratefully acknowledges the Large File Summit for their work in developing the set of changes to X/Open's Single UNIX Specification to support large files.

For further details of the Large File Summit, please see http://www.sas.com/standards/large.file (also available locally here).


1.0 Overview

1.1 The Large File Problem

As UNIX systems have become increasingly powerful, a number of system vendors and UNIX independent software vendors have developed a requirement to access files that contain more information than can be addressed using a signed long integer.

A number of major system vendors and users met at the "Large File Summit" (LFS) for over a year to develop a set of changes to the existing Single UNIX Specification (SUS) that allow both new and converted programs to address files of arbitrary sizes. This set of changes was provided to X/Open for inclusion into the next version of the SUS. In addition, a set of transitional extensions intended to permit users to immediately implement large file support on typical 32-bit UNIX operating systems was proposed. This abridged document only contains the identified changes to the SUS document and the accompanying rationale.

2.0 Changes to the Single UNIX Specification

2.1 Changes to CAE Specification System Interface Definitions, Issue 4, Version 2

The following definitions will be added to System Interface Definitions, Chapter 2, Glossary:
extended signed integral type
a signed integral type or an implementation-specific type with similar properties.
extended unsigned integral type
an unsigned integral type or an implementation-specific type with similar properties.
offset maximum
an attribute of an open file description representing the largest value that can be used as a file offset.
saved resource limits
an attribute of a process that provides some flexibility in the handling of unrepresentable resource limits, as described in the exec family of functions and setrlimit().

(Note the attribute "resource limits" as used in the SUS is not defined.)

2.2 Changes to CAE Specification System Interfaces and Headers, Issue 4, Version 2

2.2.1 Changes to System Interfaces

The following changes will be made to System Interfaces and Headers, Chapter 3, System Interfaces. The Asynchronous I/O interfaces (aio_read(), aio_write() and lio_listio()) should be included when POSIX.1b is added in a future revision to the SUS.

2.2.1.1 aio_read()

DESCRIPTION
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.
ERRORS
The following is an additional condition which may be detected synchronously or asynchronously:
[EOVERFLOW]
The file is a regular file, aiocbp->aio_nbytes is greater than 0 and the starting offset in aiocbp->aio_offset is before the end-of-file and is at or beyond the offset maximum in the open file description associated with aiocbp->aio_fildes.
Note: This is a new error condition.

2.2.1.2 aio_write()

DESCRIPTION
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.
ERRORS
The following is an additional condition which may be detected synchronously or asynchronously:
[EFBIG]
The file is a regular file, aiocbp->aio_nbytes is greater than 0 and the starting offset in aiocbp->aio_offset is at or beyond the offset maximum in the open file description associated with aiocbp->aio_fildes.
Note: This is an additional EFBIG error condition.

2.2.1.3 exec

DESCRIPTION
The saved resource limits in the new process image are set to be a copy of the process's corresponding hard and soft resource limits.

2.2.1.4 fclose(), fflush(), fputwc(), fputws(), fseek(), putwc(), putwchar()

ERRORS
These functions will fail if:
[EFBIG]
The file is a regular file and an attempt was made to write at or beyond the offset maximum associated with the corresponding stream.
Note: This is an additional EFBIG error condition.

2.2.1.5 fcntl()

DESCRIPTION
An unlock (F_UNLCK) request in which l_len is non-zero and the offset of the last byte of the requested segment is the maximum value for an object of type off_t, when the process has an existing lock in which l_len is 0 and which includes the last byte of the requested segment, will be treated as a request to unlock from the start of the requested segment with an l_len equal to 0. Otherwise an unlock (F_UNLCK) request will attempt to unlock only the requested segment.
ERRORS
The fcntl() function will fail if:
[EOVERFLOW]
One of the values to be returned cannot be represented correctly.
[EOVERFLOW]
The cmd argument is F_GETLK, F_SETLK or F_SETLKW and the smallest or, if l_len is non-zero, the largest, offset of any byte in the requested segment cannot be represented correctly in an object of type off_t.
Note: These are new error conditions.

2.2.1.6 fdopen()

DESCRIPTION
The fdopen() function will preserve the offset maximum previously set for the open file description corresponding to fildes.

2.2.1.7 fgetc(), fgets(), fgetwc(), fgetws(), fread(), fscanf(), getc(), getchar(), gets(), getw(), getwc(), getwchar(), scanf()

ERRORS
These functions will fail if data needs to be read and:
[EOVERFLOW]
The file is a regular file and an attempt was made to read at or beyond the offset maximum associated with the corresponding stream.
Note: This is a new error condition.

2.2.1.8 fgetpos()

ERRORS
The fgetpos() function will fail if:
[EOVERFLOW]
The current value of the file position cannot be represented correctly in an object of type fpos_t.
Note: This is a new error condition.

2.2.1.9 fopen(), freopen(), tmpfile()

DESCRIPTION
The largest value that can be represented correctly in an object of type off_t will be established as the offset maximum in the open file description.
ERRORS
The fopen() and freopen() functions will fail if:
[EOVERFLOW]
The named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t.
Note: This is a new error condition.

2.2.1.10 fpathconf() and pathconf()

DESCRIPTION
  Variable          Value of name          Notes
  FILESIZEBITS      _PC_FILESIZEBITS       3,4

2.2.1.11 fprintf(), fputc(), fputs(), fwrite(), printf(), putc(), putchar(), puts(), putw(), vfprintf(), vprintf()

ERRORS
These functions will fail if either the stream is unbuffered or the stream's buffer needed to be flushed and:
[EFBIG]
The file is a regular file and an attempt was made to write at or beyond the offset maximum.
Note: This is an additional EFBIG error condition.

2.2.1.12 fseek()

ERRORS
The fseek() function will fail if:
[EOVERFLOW]
The resulting file offset would be a value which cannot be represented correctly in an object of type long.
Note: This is a new error condition.

2.2.1.13 fseeko()

DESCRIPTION
The fseeko() function is identical to the modified fseek() except that the offset argument is of type off_t and the EOVERFLOW error is changed as follows:
ERRORS
[EOVERFLOW]
The resulting file offset would be a value which cannot be represented correctly in an object of type off_t.
Note: This is a new function.

2.2.1.14 fstat(), lstat() and stat()

ERRORS
These functions will fail if:
[EOVERFLOW]
The file size in bytes or the number of blocks allocated to the file or the file serial number cannot be represented correctly in the structure pointed to by buf.
Note: This is an additional EOVERFLOW error condition.

2.2.1.15 fstatvfs() and statvfs()

ERRORS
These functions will fail if:
[EOVERFLOW]
One of the values to be returned cannot be represented correctly in the structure pointed to by buf.
Note: This is a new error condition.

2.2.1.16 ftell()

ERRORS
The ftell() function will fail if:
[EOVERFLOW]
The current file offset cannot be represented correctly in an object of type long.
Note: This is a new error condition.

2.2.1.17 ftello()

DESCRIPTION
The ftello() function is identical to the modified ftell() except that the return value is of type off_t and the EOVERFLOW error is changed as follows:
ERRORS
[EOVERFLOW]
The current file offset cannot be represented correctly in an object of type off_t.
Note: This is a new function.

2.2.1.18 ftruncate()

ERRORS
The ftruncate() function will fail if:
[EFBIG]
The file is a regular file and length is greater than the offset maximum established in the open file description associated with fildes.
Note: This is an additional EFBIG error condition.

2.2.1.19 getrlimit() and setrlimit()

DESCRIPTION
When using the getrlimit() function, if a resource limit can be represented correctly in an object of type rlim_t then its representation is returned; otherwise if the value of the resource limit is equal to that of the corresponding saved hard limit the value returned is RLIM_SAVED_MAX; otherwise the value returned is RLIM_SAVED_CUR.

When using the setrlimit() function, if the requested new limit is RLIM_INFINITY the new limit will be "no limit"; otherwise if the requested new limit is RLIM_SAVED_MAX the new limit will be the corresponding saved hard limit; otherwise if the requested new limit is RLIM_SAVED_CUR the new limit will be the corresponding saved soft limit; otherwise the new limit will be the requested value. In addition, if the corresponding saved limit can be represented correctly in an object of type rlim_t then it will be overwritten with the new limit.

The result of setting a limit to RLIM_SAVED_MAX or RLIM_SAVED_CUR is unspecified unless a previous call to getrlimit() returned that value as the soft or hard limit for the corresponding resource limit.

The determination of whether a limit can be correctly represented in an object of type rlim_t is implementation-dependent. For example, some implementations permit a limit whose value is greater than RLIM_INFINITY and others do not.

The exec family of functions also cause resource limits to be saved. (See 2.2.1.3 exec).

2.2.1.20 lio_listio()

DESCRIPTION
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with aiocbp->aio_fildes.
ERRORS
The following are additional error codes which may be set for each aiocb control block:
[EOVERFLOW]
The aiocbp->aio_lio_opcode is LIO_READ, the file is a regular file, aiocbp->aio_nbytes is greater than 0, and the aiocbp->aio_offset is before the end-of-file and is greater than or equal to the offset maximum in the open file description associated with aiocbp->aio_fildes.
[EFBIG]
The aiocbp->aio_lio_opcode is LIO_WRITE, the file is a regular file, aiocbp->aio_nbytes is greater than 0, and the aiocbp->aio_offset is greater than or equal to the offset maximum in the open file description associated with aiocbp->aio_fildes.
Note: These are additional EFBIG and EOVERFLOW error conditions.

2.2.1.21 lockf()

DESCRIPTION
An F_ULOCK request in which size is non-zero and the offset of the last byte of the requested section is the maximum value for an object of type off_t, when the process has an existing lock in which size is 0 and which includes the last byte of the requested section, will be treated as a request to unlock from the start of the requested section with a size equal to 0. Otherwise an F_ULOCK request will attempt to unlock only the requested section.

ERRORS
The lockf() function will fail if:
[EINVAL]
The function argument is not one of F_LOCK, F_TLOCK, F_TEST or F_ULOCK; or size plus the current file offset is less than 0.
[EOVERFLOW]
The offset of the first, or if size is not 0 then the last, byte in the requested section cannot be represented correctly in an object of type off_t.
Note: This is a clarification of the EINVAL error condition.
Note: EOVERFLOW is a new error condition.

2.2.1.22 lseek()

ERRORS
The lseek() function will fail if:
[EOVERFLOW]
The resulting file offset would be a value which cannot be represented correctly in an object of type off_t.
Note: This is a new error condition.

2.2.1.23 mmap()

ERRORS
The mmap() function will fail if:
[EOVERFLOW]
The file is a regular file and the value of off plus len exceeds the offset maximum established in the open file description associated with fildes.
Note: This is a new error condition.

2.2.1.24 open()

DESCRIPTION
The largest value that can be represented correctly in an object of type off_t will be established as the offset maximum in the open file description.

ERRORS
The open() function will fail if:
[EOVERFLOW]
The named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t.
Note: This is a new error condition.

2.2.1.25 read() and readv()

DESCRIPTION
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with fildes.

ERRORS
The read() and readv() functions will fail if:
[EOVERFLOW]
The file is a regular file, nbyte is greater than 0, the starting position is before the end-of-file and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes.
Note: This is a new error condition.

2.2.1.26 readdir()

ERRORS
The readdir() function will fail if:
[EOVERFLOW]
One of the values in the structure to be returned cannot be represented correctly.
Note: This is a new error condition.

2.2.1.27 write() and writev()

DESCRIPTION
For regular files, no data transfer will occur past the offset maximum established in the open file description associated with fildes.
ERRORS
These functions will fail if:
[EFBIG]
The file is a regular file, nbyte is greater than 0 and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes.
Note: This is an additional EFBIG error condition.

2.2.2 Changes to Headers

The following changes will be made to System Interfaces and Headers, Chapter 4, Headers.

2.2.2.1 <limits.h>

The following symbolic constant is defined as a Pathname Variable Value:
Name             Description                Acceptable Value
FILESIZEBITS     Minimum number of bits             *
                 needed to represent,
                 as a signed integer
                 value, the maximum size
                 of a regular file
                 allowed in the
                 specified directory.

2.2.2.2 <stdio.h>

The following are declared as functions and may also be defined as macros:
int         fseeko(FILE *stream, off_t offset, int whence);
off_t       ftello(FILE *stream);
The type off_t is defined through typedef as described in <sys/types.h>.

2.2.2.3 <sys/resource.h>

The following symbolic constants are defined:
RLIM_SAVED_MAX     A value of type rlim_t indicating an
                   unrepresentable saved hard limit.
RLIM_SAVED_CUR     A value of type rlim_t indicating an
                   unrepresentable saved soft limit.
On implementations where all resource limits are representable in an object of type rlim_t, RLIM_SAVED_MAX and RLIM_SAVED_CUR need not be distinct from RLIM_INFINITY.

2.2.2.4 <sys/stat.h>

The type of st_blocks in the stat structure will be changed to:
blkcnt_t    st_blocks   number of blocks allocated for this
                        object.

2.2.2.5 <sys/statvfs.h>

The types of the fields below in the statvfs structure will be changed to:
fsblkcnt_t  f_blocks    total number of blocks in the file
                        system in units of f_frsize.
fsblkcnt_t  f_bfree     total number of free blocks.
fsblkcnt_t  f_bavail    number of free blocks available to
                        non-privileged process.
fsfilcnt_t  f_files     total number of file serial numbers.
fsfilcnt_t  f_ffree     total number of free file serial
                        numbers.
fsfilcnt_t  f_favail    number of free file serial numbers
                        available to non-privileged process.

2.2.2.6 <sys/types.h>

The following data types will be defined:
blkcnt_t                Used for file block counts.
fsblkcnt_t              Used for file system block counts.
fsfilcnt_t              Used for file system file counts.

The types blkcnt_t and off_t are defined as extended signed integral types.

The types fsblkcnt_t, fsfilcnt_t, and ino_t are defined as extended unsigned integral types.

2.2.2.7 <unistd.h>

The following symbolic constant is defined for pathconf():
_PC_FILESIZEBITS

2.3 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2

The following changes will be made to Commands and Utilities, Chapter 3, Utilities.

2.3.1 Considerations for Utilities in Support of Files of Arbitrary Size

Note: This is a new section and should be added to Commands and Utilities, Issue 4, Version 2, Chapter 3 after section 1.2.1, Symbolic Links.

The following utilities will support files of any size up to the maximum that can be created by the implementation. This support includes correct writing of file size related values (such as file sizes and offsets, line numbers, and block counts) and correct interpretation of command line arguments that contain such values.

basename   return non-directory portion of pathname
cat        concatenate and print files
cd         change working directory
chgrp      change file group ownership
chmod      change file modes
chown      change file ownership
cksum      write file checksums and sizes
cmp        compare two files
cp         copy files
dd         convert and copy a file
df         report free disk space
dirname    return directory portion of pathname
du         estimate file space usage
find       find files
ln         link files
ls         list directory contents
mkdir      make directories
mv         move files
pathchk    check pathnames
pwd        return working directory name
rm         remove directory entries
rmdir      remove directories
sh         shell, the standard command language interpreter
sum        print checksum and block or byte count of a file
test       evaluate expression
touch      change file access and modification times
ulimit     set or report file size limit
Exceptions to the requirement that utilities support files of any size up to the maximum are:
  1. Utilities such as tar and cpio cannot support arbitrary file sizes due to limitations imposed by fixed file formats.
  2. Uses of files as command scripts, or for configuration or control, are exempt. For example, it is not required that sh be able to read an arbitrarily large ".profile".
  3. Shell input and output redirection are exempt. For example, it is not required that the redirections sum < file or echo foo > file succeed for an arbitrarily large existing file.

2.3.2 The sh Utility

DESCRIPTION:
Pathname expansion will not fail due to the size of a file.

Shell input and output redirections will have an implementation-specific offset maximum that will be established in the open file description.

2.3.3 The pax Utility

APPLICATION USAGE
The pax utility is not able to handle arbitrary file sizes. There is currently a proposal in ballot in IEEE Project 1003.2b to address this issue.

Appendix A: Rationale and Notes

A.1 Overview

The reader is referred to http://www.sas.com/standards/large.file for the full rationale for this section. Only the rationale relevant to the Single UNIX Specification is included in this abridged paper.

A.2 Changes to the Single UNIX Specification

A.2.1 Changes to CAE Specification System Interfaces and Headers, Issue 4, Version 2

A.2.1.1 Changes to System Interfaces

A.2.1.1.1 Notes on Functions not Modified by this Proposal

The following functions do not require modification to meet the terms of this proposal:
aio_error(), aio_cancel(), aio_return() and aio_suspend()
No large file implications were identified for these functions.
aio_fsync()
It is possible that an aio_fsync() could try to write out file blocks that are beyond the offset maximum, just as fsync() could. There is no compelling reason for either to fail. Clearly, the original write request had to be within the offset maximum for the file description used. The aio_fsync() function will not enforce the offset maximum on the blocks which it writes out.
glob() and wordexp()
The subroutines that expand file name wild cards need to be large file capable.

A.2.1.1.2 aio_read()

The aio_read() function enforces the offset maximum rules for consistency with read() and readv().

A.2.1.1.3 aio_write()

The aio_write() function enforces the offset maximum rules for consistency with write() and writev().

A.2.1.1.4 creat()

The creat() function will fail if the named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t (see 2.2.1.24 open()). This offers protection from the following coding style:
     if (stat(path, ...) < 0) {
         /* assume file does not exist, so create it */
         if ((fd = creat(path, ...)) < 0) {
            /* print out error text */
         }
     }
In this example the stat() function is being used to determine the existence of a file. But if the file size cannot be represented correctly in an object of type off_t then stat() will fail (see 2.2.1.14 fstat(), lstat() and stat()) and if creat() did not then fail it would have the unintended effect of truncating the file to 0 length. Many applications and standard utilities have code similar to this example, including typical implementations of the touch utility.

A.2.1.1.5 fcntl() and lockf()

Unlock requests are sometimes "rounded to infinity" so that a process can create a whole-file lock and then successfully issue a request to clip off the beginning of the lock without leaving behind an unrepresentable lock. This is to avoid breaking any existing 32-bit applications which might happen to do this.

Several existing implementations of fcntl() permit locking the byte whose offset is the maximum value that can be represented correctly in a object of type off_t, even though write() cannot write to that offset. This specification permits that behavior.

The fcntl() function will fail if the cmd argument is F_GETLK and the first lock which blocks the lock description has a starting offset or length which cannot be represented correctly in an object of type off_t. Information about such a lock cannot be correctly returned.

Discussion of the semantics of fcntl() locks that cross the off_t boundary resulted in six competing proposals:

  1. An unlock request fails if it would create an unrepresentable lock.
  2. If any lock request includes the byte whose offset is the maximum value that fits in an off_t, then the request is equivalent to a request where l_len is 0 and l_start refers to the first byte of the affected area.
  3. (proposal was dropped)
  4. If l_len is 0 then the lock is through and including the maximum value of off_t (and not beyond).
  5. Just no lies.
  6. If an unlock request includes the byte whose offset is the maximum value that fits in an off_t, and there is an existing lock with l_len equal to 0 which also includes that byte, then the request is equivalent to a request where l_len is 0 and l_start refers to the first byte of the affected area.

An advantage of 2, 4, and 6 is that they do not change existing behavior of a 32-bit application.

Proposals 1 and 5 can result in a new type of failure in the case where the program creates a lock with l_len equal to 0 and then clips off the beginning leaving behind an unrepresentable lock.

Proposal 4 precludes truly "whole file" locking.

Proposal 6 was adopted because as it preserves existing 32-bit behavior and is less disruptive than proposal 2 (which extends lock requests in addition to unlock requests).

The fcntl() and lockf() functions will fail if the offset of the first byte in the region, or if l_len (size) is non-zero then the offset of last byte in the region, exceeds the largest possible value in an object of type off_t. Otherwise the process could create a lock which would be "beyond" the ability of the program to represent.

A.2.1.1.6 fgetpos(), fseek(), ftell(), lseek()

These functions will fail if the resulting file offset would exceed the largest value that can be represented correctly in the related type which is in use for the call, and will set errno to EOVERFLOW (permitted by PASC Interpretation 1003.1-90 #75).

Programs typically, but incorrectly, fail to check the return value of these functions, which renders the error return less useful. On the other hand, returning an incorrect offset can result in serious malfunction as well.

An lseek() to the end of a file using

     lseek(fd, 0, SEEK_END);
is quite common. It is unfortunate that these fail on a too-large file since the return value is usually ignored. One alternative that was considered was for lseek() to move the file offset for all valid requests and then return an error if the resulting offset is too large. That is, the call would succeed for applications that do not check the return code, but also fail for applications that do check. This option was deemed too bizarre to adopt. For example, it might be difficult to implement using a remote procedure call system that was constructed to return either results or an error, but not both. In addition, the POSIX 1003.1 standard requires the file offset to remain unchanged if an error is returned by lseek().

Another potentially serious consequence of ignoring the return value of lseek() is that programs which extend data files by attempting to seek beyond the end-of-file and then writing may instead overwrite existing data.

For example, typical implementations of the dbm and ndbm libraries contain code such as:

     (void) lseek(db->dbm_pagf, blkno*PBLKSIZ, L_SET);
     if (write(db->dbm_pagf, pagebuf, PBLKSIZ) != PBLKSIZ)
                ... error handling ...

The problem is that the return code of lseek() is not checked and so if "blkno*PBLKSIZ" overflows the lseek() will fail (or will seek to an unintended offset) and the data will be written to an unintended offset.

A.2.1.1.7 fpathconf() and pathconf()

The reference "See Note 3,4" refers to notes in the X/Open specification for fpathconf() and pathconf(). These notes indicate that this option (_PC_FILESIZEBITS) is valid only for a directory, and the results are for files that exist or may be created in that directory.

The _PC_FILESIZEBITS option makes it possible for a process to determine how large a file can be created in a given directory. It takes into account implementation limitations in the file system (e.g. due to the size of file size and block count variables), and it takes into account long term policy limitations (e.g. due to the mount utility's -o nolargefiles option). It does not take into account dynamic restrictions such as the RLIM_FSIZE resource limit or the number of available file blocks, so the process must perform appropriate checks.

When the current directory is on a typical large file capable file system and is mounted with the -o nolargefiles option,

     pathconf(".", _PC_FILESIZEBITS);
will return 32. In general, if the maximum size file that could ever exist on the mounted file system is maxsize then the returned value is 2 plus the floor of the base 2 logarithm of maxsize.

A.2.1.1.8 fseeko() and ftello()

These functions are needed because fseek() and ftell() are limited by the long offset type required by ISO C. The fsetpos() and fgetpos() functions, although they do use an opaque offset type, are not complete replacements for fseek() and ftell() because they do not allow relative seeks or arithmetic on fpos_t values.

A.2.1.1.9 fsetpos()

Since fsetpos() sets an absolute file position, which is always legal regardless of the implementation-supported sizes of off_t, there are no new error returns or other new semantics.

A.2.1.1.10 fstatvfs() and statvfs()

These functions will fail if the total, or free, or available number of blocks or files cannot be represented correctly in the structure to be returned (f_blocks, f_bfree, f_bavail, f_files, f_ffree, f_favail).

A.2.1.1.11 ftruncate(), truncate(), unlink()

These functions are used only on pre-existing files and so do not have the potential programming hazard as does creat() (see A.2.1.1.4 creat()).

When ftruncate() is used to increase the size of a file, the semantics are similar to a write() of zeroes to the file. For consistency with write(), the ftruncate() function will fail when the request is beyond the offset maximum (even if the effect of the request would be to shorten the file).

A.2.1.1.12 ftw() and nftw()

The ftw() and nftw() functions may fail if a stat() in the underlying implementation fails with EOVERFLOW. This is unfortunate because "small" binaries using these functions cannot reasonably be used on file trees containing "large" files. Some systems have a non-standard extension to nftw() which permits it to continue when stat() fails (typical failures also include ESTALE and ELOOP).

A.2.1.1.13 getrlimit() and setrlimit()

These functions map limits that they cannot represent correctly to and from RLIM_SAVED_MAX and RLIM_SAVED_CUR. These values do not require any special handling by programs. They may be thought of as tokens that the kernel hands out to programs that can't handle the real answer, and that remind the kernel, when the tokens come back from the user, of what value is really meant.

If setrlimit() fails for any reason (for example, EPERM), the resource limits and saved resource limits remain unchanged.

This proposal does not specify any particular value for RLIM_INFINITY, RLIM_SAVED_MAX or RLIM_SAVED_CUR. Typical current implementations use the value 0x7FFFFFFF for RLIM_INFINITY, and it is recommended that RLIM_SAVED_MAX and RLIM_SAVED_CUR have similar large values.

Few, if any, programs will need to refer explicitly to RLIM_SAVED_MAX or RLIM_SAVED_CUR. Those that do should not use them in C-language switch cases since they may have the same value in some implementations (see 2.2.2.3 <sys/resource.h>).

A limit that can be represented correctly in an object of type rlim_t is either "no limit", which is represented with RLIM_INFINITY, or has a value not equal to any of RLIM_INFINITY or RLIM_SAVED_MAX or RLIM_SAVED_CUR and which can be represented correctly in an object of type rlim_t and which meets any additional implementation-specific criteria for correct representation.

A rejected alternative proposal was to map limits that could not be represented to and from RLIM_INFINITY. This would avoid the need for the new symbols RLIM_SAVED_MAX and RLIM_SAVED_CUR. But such mapping would arguably be a lie, and the resulting information loss would cause unintuitive program behavior, especially in programs running with appropriate privileges needed to raise hard limits.

A rejected alternative proposal was that if getrlimit() could not correctly return a current limit then it should instead return -1 and set errno to EOVERFLOW. But that would result in unnecessary breakage of programs. (Note that this breakage occurs even when no large files are present.) It would also result in malfunction of programs that assume that they are calling getrlimit() properly and so failure "cannot happen". For example, in the 4.4 BSD-Lite distribution, there are at least 15 unchecked calls to getrlimit(). When the 4.4 BSD csh limit function is used to report the current limits, there is no check of the return code and so the reported results can be entirely incorrect. Also, non-superuser programs typically unlimit themselves with:

     getrlimit(RLIMIT_STACK, &rl);
     rl.rlim_cur = rl.rlim_max;
     setrlimit(RLIMIT_STACK, &rl);
If the getrlimit() fails then garbage is passed to setrlimit() which may result in an unwanted and extremely restricted limit. Several utilities that are part of the GNU C compiler have this problem.

A.2.1.1.14 lio_listio()

The lio_list() function enforces the offset maximum rules since they are logically equivalent to aio_read() and aio_write() which enforce it.

A.2.1.1.15 mmap()

For consistency with read() and write(), the mmap() function will fail when the request extends beyond the offset maximum.

A.2.1.1.16 open()

The open() function called with O_TRUNC set will fail without truncation if the named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t. (See A.2.1.1.4 creat()).

A.2.1.1.17 read(), readv(), write() and writev()

These functions may do a "partial read or write" due to the offset maximum. That is, the value returned may be less than nbyte if the number of bytes remaining which may be transferred is less than nbyte.

A.2.1.1.18 ulimit()

The ulimit() function will return an unspecified result if the result cannot be represented correctly in an object of type long. As this function is already obsolescent, the use of getrlimit() and setrlimit() is recommended for getting and setting process limits.

A.2.2 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2

A.2.2.1 General Porting Suggestions

When porting a program to be large file capable, general areas of concern include:

A.2.2.1.1 Command Line Arguments

Numeric arguments which are file size related, such as a file offset or block count, need to be handled as an appropriately large type. Converting arguments into an off_t that is larger than a long may need to be accomplished with non-standard scanf() formats, if available, or with portable user-written functions that convert ASCII to a large off_t analogous to the strtol() function.

A.2.2.1.2 Output Formatting

Output of types that have been converted will probably involve using a different printf() format or using a revised user-written conversion routine. Since there is a larger range of values which take up more space, revision of the output layout may be required.

A.2.2.1.3 Fixed Format Media Issues

Current implementations of the tar and cpio utilities are defective in their support of arbitrarily large files. The pax utility is also equally defective, but is the subject of a proposal in ballot. (See 2.3.3 The pax Utility for discussion of this topic.)

Vendor and third-party backup software is also unable to support large files and will require modification in order to do so.

A.2.2.1.4 Other Languages

This specification is for the C language only. Other languages have different support requirements. For example, the Fortran I/O API has a limit on the number of records, not bytes.

A.2.2.2 Considerations for Utilities in Support of Files of Arbitrary Size

The utilities listed in 2.3.1 Considerations for Utilities in Support of Files of Arbitrary Size are utilities which are used to perform administrative tasks such as to create, move, copy, remove, change the permissions, or measure the resources of a file. They are useful both as end-user tools and as utilities invoked by applications during software installation and operation.

Typical core utilities must be compiled in a "large" off_t compilation environment or must use the transitional APIs. Using the compilation environment reduces the number of editing changes required to port a program, but it does not reduce the effort required to ensure the correctness of the port.

The chgrp, chmod, chown, ln, and rm utilities probably require use of large file capable versions of stat(), lstat(), ftw(), and the stat structure.

The cat, cksum, cmp, cp, dd, mv, sum, and touch utilities probably require use of large file capable versions of creat(), open(), and fopen().

The cat, cksum, cmp, dd, df, du, ls, and sum utilities may require writing large integer values. For example,

The dd, find and test utilities may need to interpret command arguments that contain 64-bit values. For dd the arguments include skip=n, seek=n, and count=n. For find the arguments include -size n. For test the arguments are those associated with algebraic comparisons.

The df utility might need to access large file systems with statvfs().

The ulimit utility will need to use large file capable versions of getrlimit() and setrlimit() and be able to read and write large integer values.

Conversion between off_t (or other derived types) and ASCII is unspecified, which is a significant practical deficiency. This is being considered by other groups. For example, see: ftp://ftp.dmk.com/DMK/sc22wg14/c9x/extended-integers/

A.2.2.3 Additional Requirements for the sh Utility - Porting Recommendations

Pathname expansion (e.g. expanding */foo.c to a/foo.c b/foo.c c/foo.c) and pathname completion might in some cases use the stat() function which would need to be large file capable.

The offset maximum used for shell input and output redirections is implementation-specific. Some vendors prefer to use the smallest supported off_t, others prefer the largest.


Read other technical papers.

Read or download the complete Single UNIX Specification from http://www.UNIX-systems.org/go/unix.

Copyright © 1997-1998 The Open Group

UNIX is a registered trademark of The Open Group.