Linker Reference Manual
HI-TECH C incorporates a relocating assembler and linker to permit separate compilation of C source files. This means that a program may be divided into several source files, each of which may be kept to a manageable size for ease of editing and compilation, then each object file compiled separately and finally all the object files linked together into a single executable program.
The assembler is described in the machine-specific manual. This appendix describes the theory behind and the usage of the linker.
Relocation and Psects
The fundamental task of the linker is to combine several relocatable object files into one. The object files are said to be relocatable since the files have sufficient information in them so that any references to program or data addresses (e.g. the address of a function) within the file may be adjusted according to where the file is ultimately located in memory after the linkage process. Thus the file is said to be relocatable. Relocation may take two basic forms; relocation by name, i.e. relocation by the ultimate value of a global symbol, or relocation by psect, i.e. relocation by the base address of a particular section of code, for example the section of code containing the actual excutable instructions.
Any object file may contain bytes to be stored in memory in one or more program sections, which will be referred to as psects. These psects represent logical groupings of certain types of code bytes in the program. The section of the program containing executable instructions is normally referred to as the text psect. Other sections are the initialized data psect, called simply the data psect, and the uninitialized data psect, called the bss psect.
In fact the linker will handle any number of psects, and in fact more may be used in special applications. However the C compiler uses only the three mentioned, and the names text, data and bss are simply chosen for identification; the linker assigns no special significance to the name of a psect.
The difference between the data and bss psects may be exemplified by considering two external variables; one is initialized to the value 1, and the other is not initialized. The first will be placed into the data psect, and the second in the bss psect. The bss psect is always cleared to zeros on startup of the program, thus the second variable will be initialized at run time to zero. The first will however occupy space in the program file, and will maintain its initialized value of 1 at startup. It is quite possible to modify the value of a variable in the data psect during execution, however it is better practice not to do so, since this leads to more consistent use of variables, and allows for restartable and romable programs.
The text psect is the section into which all executable instructions are placed. On CP/M-80 the text psect will normally start at the base of the TPA, which is where execution commences. The data psect will normally follow the text psect, and the bss will be last. The bss does not occupy space in the program (.COM) file. This ordering of psects may be overridden by an option to the linker. This is especially useful when producing code for special hardware.
For MS-DOS and CP/M-86 the psects are ordered in the same way, but since the 8086 processor has segment registers providing relocation, both the text and data psects start at 0, even though they will be loaded one after the other in memory. This allows 64k code and 64k data and stack. Sufficient information is placed in the executable file (.EXE or .CMD) for the operating system to load the program in memory.
Local Psects and the Large Model
Since for practical purposes the psects are limited to 64K on the 8086, to allow more than 64K code the compiler makes use of local psects. A psect is considered local if the .psect directive has a LOCAL flag. Any number of local psects may be linked from different modules without being combined even if they have the same name. Note however that no local psect may have the same name as a global psect.
All references to a local psect within the same module (or within the same library) will be treated as references to the same psect. Between modules however two local psects of the same name are treated as distinct. In order to allow collective referencing of local psects via the -P option (described later) a local psect may have a class name asso- ciated with it. This is achieved witht the CLASS flag on the .psect directive.
The linker handles only symbols which have been declared as global to the assembler. From the C source level, this means all names which have storage class external and which are not declared as static. These symbols may be referred to by modules other than the one in which they are defined. It is the linker's job to match up the definition of a global symbol with the references to it.
A command to the linker takes the following form:
LINK options files ...
Options is zero or more linker options, each of which modifies the behaviour of the linker in some way. Files is one or more object files, and zero or more library names. The options recognized by the linker are as follows: they will be recognized in upper or lower case.
- Leave the output relocatable.
- Retain absolute relocation info. -LM will retain only segement relocation information.
- Ignore undefined symbols.
- Sort symbols by address.
- Produce a binary output file offset by addr.
- Strip symbol information from the output file.
- Suppress local symbols in the output file.
- Suppress trivial (compiler-generated) symbols in the output file.
- Call the output file name.
- Spec is a psect location specification.
- Write a link map to the file name.
- Make symbol initially undefined.
- Write a symbol file.
- Specify map width.
Taking each of these in turn:
The -R option will instruct the linker to leave the output file (as named by a -O option, or l.obj by default) relocatable. This is normally because there are further files to be linked in, and the output of this link will be used as input to the linker subsequently. Without this option, the linker will make the output file absolute, that is with all relocatable addresses made into absolute references. This option may not be used with the -L or -C options.
The -L option will cause the linker to output null relocation information even though the file will be absolute. This information allows self-relocating programs to know what addresses must be relocated at run time. This option is not usable with the -C option. In order to create an executable file (i.e. a .COM file) the program objtohex must be used. If a -LM option is used, only segment relocation information will be retained. This is used in conjuc- tion with the large memory model. Objtohex will use the relocation information (when invoked with a -L flag) to insert segment relocation addresses into the executable file.
The -I option is used when it is desired to link code which contains symbols which are not defined in any module. This is normally only used during top-down program development, when routines are referenced in code written before the routines themselves have been coded.
When obtaining a link map via the -M option, the symbol table is by default sorted in order of symbol name. To sort in order of address, the -N option may be used.
The output of the linker is by default an object file. To create an executable program, this must be converted into an executable image. For CP/M this is a .COM file, which is simply an image of the executable program as it should appear in memory, starting at location 100H. The linker will produce such a file with the -C100H option. File formats for other applications requiring an image binary file may also be produced with the -C option. The address following the -C may be given in decimal (default), octal (by using o or O suffix) or hexadecimal (by using an h or H suffix).
Note that because of the complexity of the executable file formats for MS-DOS and CP/M-86, LINK will not produce these (.EXE and .CMD resp.) formats directly. The compiler automatically runs OBJTOHEX with appropriate options to generate the correct file format.
The -S, -X and -Z options, which are meaningless when the -C option is used, will strip respectively all symbols, all local symbols or all trivial local symbols from the out- put file. Trivial symbols are symbols produced by the compiler, and have the form of one of a set of alphabetic char- acters followed by a digit string.
The default output file name is l.obj, or l.bin when the -C option is used. This may be overridden by the -Oname option. The output file will be called name in this instance. Note that no suffix is appended to the name; the file will be called exactly the argument to the option.
For certain specialized applications, e.g. producing code for an embedded microprocessor, it is necessary to specify to the linker at what address the various psects should be located. This is accomplished with the -P option. It is followed by a specification consisting of a comma- separated list of psect names, each with an optional address specification. In the absence of an address specification for a psect listed, it will be concatenated with the previous psect. For example
This will cause the text psect to be located at 0C000H, the data psect to start at the end of the text psect, and the bss psect to start at 8000H. This may be for a processor with ROM at 0C000H and RAM at 8000H.
Where the link address, that is the address at which the code will be addressed at execution time, and the load address, that is the address offset within the output file, are different (e.g for the 8086) it is possible to specify the load address separately from the link address. For example:
This specification will cause the text segment to be linked for execution at 100h, but loaded in the output file at 0, while the data segment will be linked for 0C000h, but loaded contiguously with the text psect in the file. Note that if the slash (`/') is omitted, the load address is the same as the link address, while if the slash is supplied, but not followed by an address, the psect will be loaded after the previous psect.
In order to specify link and load addresses for local psects, the group name to which the psects belong may be used in place of a global psect name. The local psects will then have a link address as specified in the -P option, and load addresses incrementing upwards from the specified load address.
The -Mname option requests a link map, containing symbol table and module load address information to be written onto the file name. If name is omitted, the map will be written to standard output. -W may be used to specify the desired width of the map.
The -U option allows the specification to the linker of a symbol which is to be initially entered into the symbol table as undefined. This is useful when loading entirely from libraries. More than one -U flag may be used.
If it is desired to use the debugger on the program being linked, it is useful to produce a symbol file. The -Dfile option will write such a symbol file onto the named file, or l.sym if no file is given. The symbol file consists of a list of addresses and symbols, one per line.
Here are some examples of using the linker. Note however that in the normal case it is not necessary to invoke the linker explicitly, since it is invoked automatically by the C command.
LINK -MMAP -C100H START.OBJ MAIN.OBJ A:LIBC.LIB
This command links the files start.obj and main.obj with the library a:libc.lib. Only those modules that are required from the library will be in fact linked in. The output is to be in .COM format, placed in the default file l.bin. A map is to be written to the file of the name map. Note that the file start.obj should contain startup code, and in fact the lowest address code in that file will be executed when the program is run, since it will be at 100H.
LINK -X -R -OX.OBJ FILE1.OBJ FILE2.OBJ A:LIBC.LIB
The files file1.obj and file2.obj will be linked with any necessary routines from a:libc.lib and left in the file x.obj. This file will remain relocatable. Undefined symbols will not cause an error. The file x.obj will probably later be the object of another link invocation. All local symbols will be stripped from the output file, thus saving space.
Invoking the Linker
The linker is called LINK, and normally resides on the A: drive, under CP/M, or in the directory A:\HITECH\ under MS-DOS. It may be invoked with no arguments, in which case it will prompt for input from standard input. If the standard input is a file, no prompts will be printed. The input supplied in this manner may contain lower case, whereas CP/M converts the entire command line to upper case by default. This is useful with the -U and -P options. This manner of invocation is generally useful if the number of arguments to LINK is large. Even if the list of files is too long to fit on one line, continuation lines may be included by leaving a backslash ('\') at the end of the preceding line. In this fashion, LINK commands of almost unlimited length may be issued.