-

Difference between revisions of "Linker Reference Manual"

From HI-TECH C for CP/M Fan WIKI(EN)
Jump to: navigation, search
(Created page with " HI-TECH C incorporates a relocating assembler and linker to permit separate compilation of C source files. This means that a program may be divided into several source files,...")
 
 
Line 4: Line 4:
 
This means that a program may be divided into several source
 
This means that a program may be divided into several source
 
files, each of which may be kept to a manageable size for
 
files, each of which may be kept to a manageable size for
ease of editing and compilation, then each object file com-
+
ease of editing and compilation, then each object file compiled separately and finally all the object files linked
piled separately and finally all the object files linked
 
 
together into a single executable program.
 
together into a single executable program.
  
Line 19: Line 18:
 
information in them so that any references to program or
 
information in them so that any references to program or
 
data addresses (e.g. the address of a function) within the
 
data addresses (e.g. the address of a function) within the
file may be adjusted according to where the file is ulti-
+
file may be adjusted according to where the file is ultimately located in memory after the linkage process. Thus the
mately located in memory after the linkage process. Thus the
 
 
file is said to be relocatable. Relocation may take two
 
file is said to be relocatable. Relocation may take two
 
basic forms; relocation by name, i.e. relocation by the
 
basic forms; relocation by name, i.e. relocation by the
Line 32: Line 30:
 
Any object file may contain bytes to be stored in
 
Any object file may contain bytes to be stored in
 
memory in one or more program sections, which will be
 
memory in one or more program sections, which will be
referred to as psects. These psects represent logical group-
+
referred to as psects. These psects represent logical groupings of certain types of code bytes in the program. The
ings of certain types of code bytes in the program. The
 
 
section of the program containing executable instructions is
 
section of the program containing executable instructions is
 
normally referred to as the text psect. Other sections are
 
normally referred to as the text psect. Other sections are
Line 40: Line 37:
  
 
In fact the linker will handle any number of psects,
 
In fact the linker will handle any number of psects,
and in fact more may be used in special applications. How-
+
and in fact more may be used in special applications. However the C compiler uses only the three mentioned, and the
ever the C compiler uses only the three mentioned, and the
+
names text, data and bss are simply chosen for identification; the linker assigns no special significance to the name
names text, data and bss are simply chosen for identifica-
 
tion; the linker assigns no special significance to the name
 
 
of a psect.
 
of a psect.
  
 
The difference between the data and bss psects may be
 
The difference between the data and bss psects may be
 
exemplified by considering two external variables; one is
 
exemplified by considering two external variables; one is
initialized to the value 1, and the other is not initial-
+
initialized to the value 1, and the other is not initialized. The first will be placed into the data psect, and the
ized. The first will be placed into the data psect, and the
 
 
second in the bss psect. The bss psect is always cleared to
 
second in the bss psect. The bss psect is always cleared to
 
zeros on startup of the program, thus the second variable
 
zeros on startup of the program, thus the second variable
will be initialized at run time to zero. The first will how-
+
will be initialized at run time to zero. The first will however occupy space in the program file, and will maintain its
ever occupy space in the program file, and will maintain its
 
 
initialized value of 1 at startup. It is quite possible to
 
initialized value of 1 at startup. It is quite possible to
modify the value of a variable in the data psect during exe-
+
modify the value of a variable in the data psect during execution, however it is better practice not to do so, since
cution, however it is better practice not to do so, since
 
 
this leads to more consistent use of variables, and allows
 
this leads to more consistent use of variables, and allows
 
for restartable and romable programs.
 
for restartable and romable programs.
  
 
The text psect is the section into which all executable
 
The text psect is the section into which all executable
instructions are placed. On CP/M-80 the text psect will nor-
+
instructions are placed. On CP/M-80 the text psect will normally start at the base of the TPA, which is where execution
mally start at the base of the TPA, which is where execution
 
 
commences. The data psect will normally follow the text
 
commences. The data psect will normally follow the text
 
psect, and the bss will be last. The bss does not occupy
 
psect, and the bss will be last. The bss does not occupy
 
space in the program (.COM) file. This ordering of psects
 
space in the program (.COM) file. This ordering of psects
may be overridden by an option to the linker. This is espe-
+
may be overridden by an option to the linker. This is especially useful when producing code for special hardware.
cially useful when producing code for special hardware.
 
  
 
For MS-DOS and CP/M-86 the psects are ordered in the
 
For MS-DOS and CP/M-86 the psects are ordered in the
Line 73: Line 63:
 
providing relocation, both the text and data psects start at
 
providing relocation, both the text and data psects start at
 
0, even though they will be loaded one after the other in
 
0, even though they will be loaded one after the other in
memory. This allows 64k code and 64k data and stack. Suffi-
+
memory. This allows 64k code and 64k data and stack. Sufficient information is placed in the executable file (.EXE or
cient information is placed in the executable file (.EXE or
 
 
.CMD) for the operating system to load the program in
 
.CMD) for the operating system to load the program in
 
memory.
 
memory.
Line 94: Line 83:
 
collective referencing of local psects via the -P option
 
collective referencing of local psects via the -P option
 
(described later) a local psect may have a class name asso-
 
(described later) a local psect may have a class name asso-
ciated with it. This is achieved witht the _�C_�L_�A_�S_�S flag on the
+
ciated with it. This is achieved witht the <strong>CLASS</strong> flag on the
 
.psect directive.
 
.psect directive.
  
Line 101: Line 90:
 
The linker handles only symbols which have been
 
The linker handles only symbols which have been
 
declared as global to the assembler. From the C source
 
declared as global to the assembler. From the C source
level, this means all names which have storage class exter-
+
level, this means all names which have storage class external and which are not declared as static. These symbols may
nal and which are not declared as static. These symbols may
 
 
be referred to by modules other than the one in which they
 
be referred to by modules other than the one in which they
are defined. It is the linker's job to match up the defini-
+
are defined. It is the linker's job to match up the definition of a global symbol with the references to it.
tion of a global symbol with the references to it.
 
  
 
== Operation ==
 
== Operation ==
Line 111: Line 98:
 
A command to the linker takes the following form:
 
A command to the linker takes the following form:
  
LINK options files ...
+
LINK options files ...
  
  
Line 120: Line 107:
 
will be recognized in upper or lower case.
 
will be recognized in upper or lower case.
  
-R Leave the output relocatable.
+
;-R
 +
:Leave the output relocatable.
  
-L Retain absolute relocation info. -LM will retain only
+
;-L
segement relocation information.
+
:Retain absolute relocation info. -LM will retain only segement relocation information.
  
-I Ignore undefined symbols.
+
;-I
 +
:Ignore undefined symbols.
  
-N Sort symbols by address.
+
;-N
 +
:Sort symbols by address.
  
-Caddr
+
;-Caddr
Produce a binary output file offset by addr.
+
:Produce a binary output file offset by addr.
  
-S Strip symbol information from the output file.
+
;-S
 +
:Strip symbol information from the output file.
  
-X Suppress local symbols in the output file.
+
;-X
 +
:Suppress local symbols in the output file.
  
-Z Suppress trivial (compiler-generated) symbols in the
+
;-Z
output file.
+
:Suppress trivial (compiler-generated) symbols in the output file.
  
-Oname
+
;-Oname
Call the output file name.
+
:Call the output file name.
  
-Pspec
+
;-Pspec
Spec is a psect location specification.
+
:Spec is a psect location specification.
  
-Mname
+
;-Mname
Write a link map to the file name.
+
:Write a link map to the file name.
  
-Usymbol
+
;-Usymbol
Make symbol initially undefined.
+
:Make symbol initially undefined.
  
-Dfile
+
;-Dfile
Write a symbol file.
+
:Write a symbol file.
  
-Wwidth
+
;-Wwidth
Specify map width.
+
:Specify map width.
  
 
Taking each of these in turn:
 
Taking each of these in turn:
Line 165: Line 157:
 
used as input to the linker subsequently. Without this
 
used as input to the linker subsequently. Without this
 
option, the linker will make the output file absolute, that
 
option, the linker will make the output file absolute, that
is with all relocatable addresses made into absolute refer-
+
is with all relocatable addresses made into absolute references. This option may not be used with the -L or -C
ences. This option may not be used with the -L or -C
 
 
options.
 
options.
  
 
The -L option will cause the linker to output null
 
The -L option will cause the linker to output null
relocation information even though the file will be abso-
+
relocation information even though the file will be absolute. This information allows self-relocating programs to
lute. This information allows self-relocating programs to
 
 
know what addresses must be relocated at run time. This
 
know what addresses must be relocated at run time. This
 
option is not usable with the -C option. In order to create
 
option is not usable with the -C option. In order to create
 
an executable file (i.e. a .COM file) the program objtohex
 
an executable file (i.e. a .COM file) the program objtohex
must be used. If a -LM option is used, only segment reloca-
+
must be used. If a -LM option is used, only segment relocation information will be retained. This is used in conjuc-
tion information will be retained. This is used in conjuc-
 
 
tion with the large memory model. Objtohex will use the
 
tion with the large memory model. Objtohex will use the
 
relocation information (when invoked with a -L flag) to
 
relocation information (when invoked with a -L flag) to
Line 184: Line 173:
 
The -I option is used when it is desired to link code
 
The -I option is used when it is desired to link code
 
which contains symbols which are not defined in any module.
 
which contains symbols which are not defined in any module.
This is normally only used during top-down program develop-
+
This is normally only used during top-down program development, when routines are referenced in code written before
ment, when routines are referenced in code written before
 
 
the routines themselves have been coded.
 
the routines themselves have been coded.
  
Line 206: Line 194:
 
file formats for MS-DOS and CP/M-86, LINK will not produce
 
file formats for MS-DOS and CP/M-86, LINK will not produce
 
these (.EXE and .CMD resp.) formats directly. The compiler
 
these (.EXE and .CMD resp.) formats directly. The compiler
automatically runs OBJTOHEX with appropriate options to gen-
+
automatically runs OBJTOHEX with appropriate options to generate the correct file format.
erate the correct file format.
 
  
 
The -S, -X and -Z options, which are meaningless when
 
The -S, -X and -Z options, which are meaningless when
 
the -C option is used, will strip respectively all symbols,
 
the -C option is used, will strip respectively all symbols,
 
all local symbols or all trivial local symbols from the out-
 
all local symbols or all trivial local symbols from the out-
put file. Trivial symbols are symbols produced by the com-
+
put file. Trivial symbols are symbols produced by the compiler, and have the form of one of a set of alphabetic char-
piler, and have the form of one of a set of alphabetic char-
 
 
acters followed by a digit string.
 
acters followed by a digit string.
  
The default output file name is _�l._�o_�b_�j, or _�l._�b_�i_�n when
+
The default output file name is <strong>l.obj</strong>, or <strong>l.bin</strong> when
the -C option is used. This may be overridden by the -O_�n_�a_�m_�e
+
the -C option is used. This may be overridden by the -O<strong>name</strong>
option. The output file will be called _�n_�a_�m_�e in this
+
option. The output file will be called <strong>name</strong> in this
 
instance. Note that no suffix is appended to the name; the
 
instance. Note that no suffix is appended to the name; the
 
file will be called exactly the argument to the option.
 
file will be called exactly the argument to the option.
Line 229: Line 215:
 
separated list of psect names, each with an optional address
 
separated list of psect names, each with an optional address
 
specification. In the absence of an address specification
 
specification. In the absence of an address specification
for a psect listed, it will be concatenated with the previ-
+
for a psect listed, it will be concatenated with the previous psect. For example
ous psect. For example
 
  
-Ptext=0c000h,data,bss=8000h
+
-Ptext=0c000h,data,bss=8000h
  
  
Line 244: Line 229:
 
address, that is the address offset within the output file,
 
address, that is the address offset within the output file,
 
are different (e.g for the 8086) it is possible to specify
 
are different (e.g for the 8086) it is possible to specify
the load address separately from the link address. For exam-
+
the load address separately from the link address. For example:
ple:
 
  
-Ptext=100h/0,data=0C000h/
+
-Ptext=100h/0,data=0C000h/
  
  
Line 266: Line 250:
 
address.
 
address.
  
The -Mname option requests a link map, containing sym-
+
The -Mname option requests a link map, containing symbol table and module load address information to be written
bol table and module load address information to be written
 
 
onto the file name. If name is omitted, the map will be
 
onto the file name. If name is omitted, the map will be
 
written to standard output. -W may be used to specify the
 
written to standard output. -W may be used to specify the
Line 279: Line 262:
 
If it is desired to use the debugger on the program
 
If it is desired to use the debugger on the program
 
being linked, it is useful to produce a symbol file. The
 
being linked, it is useful to produce a symbol file. The
-D_�f_�i_�l_�e option will write such a symbol file onto the named
+
-D<strong>file</strong> option will write such a symbol file onto the named
_�f_�i_�l_�e, or _�l._�s_�y_�m if no file is given. The symbol file consists
+
<strong>file</strong>, or <strong>l.sym</strong> if no file is given. The symbol file consists
 
of a list of addresses and symbols, one per line.
 
of a list of addresses and symbols, one per line.
  
 
== Examples ==
 
== Examples ==
  
Here are some examples of using the linker. Note how-
+
Here are some examples of using the linker. Note however that in the normal case it is not necessary to invoke
ever that in the normal case it is not necessary to invoke
 
 
the linker explicitly, since it is invoked automatically by
 
the linker explicitly, since it is invoked automatically by
 
the C command.
 
the C command.
  
LINK -MMAP -C100H START.OBJ MAIN.OBJ A:LIBC.LIB
+
LINK -MMAP -C100H START.OBJ MAIN.OBJ A:LIBC.LIB
  
  
Line 302: Line 284:
 
executed when the program is run, since it will be at 100H.
 
executed when the program is run, since it will be at 100H.
  
LINK -X -R -OX.OBJ FILE1.OBJ FILE2.OBJ A:LIBC.LIB
+
LINK -X -R -OX.OBJ FILE1.OBJ FILE2.OBJ A:LIBC.LIB
  
  
Line 317: Line 299:
 
A: drive, under CP/M, or in the directory A:\HITECH\ under
 
A: drive, under CP/M, or in the directory A:\HITECH\ under
 
MS-DOS. It may be invoked with no arguments, in which case
 
MS-DOS. It may be invoked with no arguments, in which case
it will prompt for input from standard input. If the stan-
+
it will prompt for input from standard input. If the standard input is a file, no prompts will be printed. The input
dard input is a file, no prompts will be printed. The input
 
 
supplied in this manner may contain lower case, whereas CP/M
 
supplied in this manner may contain lower case, whereas CP/M
 
converts the entire command line to upper case by default.
 
converts the entire command line to upper case by default.

Latest revision as of 11:16, 31 July 2017

HI-TECH C incorporates a relocating assembler and linker to permit separate compilation of C source files. This means that a program may be divided into several source files, each of which may be kept to a manageable size for ease of editing and compilation, then each object file compiled separately and finally all the object files linked together into a single executable program.

The assembler is described in the machine-specific manual. This appendix describes the theory behind and the usage of the linker.

Relocation and Psects

The fundamental task of the linker is to combine several relocatable object files into one. The object files are said to be relocatable since the files have sufficient information in them so that any references to program or data addresses (e.g. the address of a function) within the file may be adjusted according to where the file is ultimately located in memory after the linkage process. Thus the file is said to be relocatable. Relocation may take two basic forms; relocation by name, i.e. relocation by the ultimate value of a global symbol, or relocation by psect, i.e. relocation by the base address of a particular section of code, for example the section of code containing the actual excutable instructions.

Program Sections

Any object file may contain bytes to be stored in memory in one or more program sections, which will be referred to as psects. These psects represent logical groupings of certain types of code bytes in the program. The section of the program containing executable instructions is normally referred to as the text psect. Other sections are the initialized data psect, called simply the data psect, and the uninitialized data psect, called the bss psect.

In fact the linker will handle any number of psects, and in fact more may be used in special applications. However the C compiler uses only the three mentioned, and the names text, data and bss are simply chosen for identification; the linker assigns no special significance to the name of a psect.

The difference between the data and bss psects may be exemplified by considering two external variables; one is initialized to the value 1, and the other is not initialized. The first will be placed into the data psect, and the second in the bss psect. The bss psect is always cleared to zeros on startup of the program, thus the second variable will be initialized at run time to zero. The first will however occupy space in the program file, and will maintain its initialized value of 1 at startup. It is quite possible to modify the value of a variable in the data psect during execution, however it is better practice not to do so, since this leads to more consistent use of variables, and allows for restartable and romable programs.

The text psect is the section into which all executable instructions are placed. On CP/M-80 the text psect will normally start at the base of the TPA, which is where execution commences. The data psect will normally follow the text psect, and the bss will be last. The bss does not occupy space in the program (.COM) file. This ordering of psects may be overridden by an option to the linker. This is especially useful when producing code for special hardware.

For MS-DOS and CP/M-86 the psects are ordered in the same way, but since the 8086 processor has segment registers providing relocation, both the text and data psects start at 0, even though they will be loaded one after the other in memory. This allows 64k code and 64k data and stack. Sufficient information is placed in the executable file (.EXE or .CMD) for the operating system to load the program in memory.

Local Psects and the Large Model

Since for practical purposes the psects are limited to 64K on the 8086, to allow more than 64K code the compiler makes use of local psects. A psect is considered local if the .psect directive has a LOCAL flag. Any number of local psects may be linked from different modules without being combined even if they have the same name. Note however that no local psect may have the same name as a global psect.

All references to a local psect within the same module (or within the same library) will be treated as references to the same psect. Between modules however two local psects of the same name are treated as distinct. In order to allow collective referencing of local psects via the -P option (described later) a local psect may have a class name asso- ciated with it. This is achieved witht the CLASS flag on the .psect directive.

Global Symbols

The linker handles only symbols which have been declared as global to the assembler. From the C source level, this means all names which have storage class external and which are not declared as static. These symbols may be referred to by modules other than the one in which they are defined. It is the linker's job to match up the definition of a global symbol with the references to it.

Operation

A command to the linker takes the following form:

LINK options files ...


Options is zero or more linker options, each of which modifies the behaviour of the linker in some way. Files is one or more object files, and zero or more library names. The options recognized by the linker are as follows: they will be recognized in upper or lower case.

-R
Leave the output relocatable.
-L
Retain absolute relocation info. -LM will retain only segement relocation information.
-I
Ignore undefined symbols.
-N
Sort symbols by address.
-Caddr
Produce a binary output file offset by addr.
-S
Strip symbol information from the output file.
-X
Suppress local symbols in the output file.
-Z
Suppress trivial (compiler-generated) symbols in the output file.
-Oname
Call the output file name.
-Pspec
Spec is a psect location specification.
-Mname
Write a link map to the file name.
-Usymbol
Make symbol initially undefined.
-Dfile
Write a symbol file.
-Wwidth
Specify map width.

Taking each of these in turn:

The -R option will instruct the linker to leave the output file (as named by a -O option, or l.obj by default) relocatable. This is normally because there are further files to be linked in, and the output of this link will be used as input to the linker subsequently. Without this option, the linker will make the output file absolute, that is with all relocatable addresses made into absolute references. This option may not be used with the -L or -C options.

The -L option will cause the linker to output null relocation information even though the file will be absolute. This information allows self-relocating programs to know what addresses must be relocated at run time. This option is not usable with the -C option. In order to create an executable file (i.e. a .COM file) the program objtohex must be used. If a -LM option is used, only segment relocation information will be retained. This is used in conjuc- tion with the large memory model. Objtohex will use the relocation information (when invoked with a -L flag) to insert segment relocation addresses into the executable file.

The -I option is used when it is desired to link code which contains symbols which are not defined in any module. This is normally only used during top-down program development, when routines are referenced in code written before the routines themselves have been coded.

When obtaining a link map via the -M option, the symbol table is by default sorted in order of symbol name. To sort in order of address, the -N option may be used.

The output of the linker is by default an object file. To create an executable program, this must be converted into an executable image. For CP/M this is a .COM file, which is simply an image of the executable program as it should appear in memory, starting at location 100H. The linker will produce such a file with the -C100H option. File formats for other applications requiring an image binary file may also be produced with the -C option. The address following the -C may be given in decimal (default), octal (by using o or O suffix) or hexadecimal (by using an h or H suffix).

Note that because of the complexity of the executable file formats for MS-DOS and CP/M-86, LINK will not produce these (.EXE and .CMD resp.) formats directly. The compiler automatically runs OBJTOHEX with appropriate options to generate the correct file format.

The -S, -X and -Z options, which are meaningless when the -C option is used, will strip respectively all symbols, all local symbols or all trivial local symbols from the out- put file. Trivial symbols are symbols produced by the compiler, and have the form of one of a set of alphabetic char- acters followed by a digit string.

The default output file name is l.obj, or l.bin when the -C option is used. This may be overridden by the -Oname option. The output file will be called name in this instance. Note that no suffix is appended to the name; the file will be called exactly the argument to the option.

For certain specialized applications, e.g. producing code for an embedded microprocessor, it is necessary to specify to the linker at what address the various psects should be located. This is accomplished with the -P option. It is followed by a specification consisting of a comma- separated list of psect names, each with an optional address specification. In the absence of an address specification for a psect listed, it will be concatenated with the previous psect. For example

-Ptext=0c000h,data,bss=8000h


This will cause the text psect to be located at 0C000H, the data psect to start at the end of the text psect, and the bss psect to start at 8000H. This may be for a processor with ROM at 0C000H and RAM at 8000H.

Where the link address, that is the address at which the code will be addressed at execution time, and the load address, that is the address offset within the output file, are different (e.g for the 8086) it is possible to specify the load address separately from the link address. For example:

-Ptext=100h/0,data=0C000h/


This specification will cause the text segment to be linked for execution at 100h, but loaded in the output file at 0, while the data segment will be linked for 0C000h, but loaded contiguously with the text psect in the file. Note that if the slash (`/') is omitted, the load address is the same as the link address, while if the slash is supplied, but not followed by an address, the psect will be loaded after the previous psect.

In order to specify link and load addresses for local psects, the group name to which the psects belong may be used in place of a global psect name. The local psects will then have a link address as specified in the -P option, and load addresses incrementing upwards from the specified load address.

The -Mname option requests a link map, containing symbol table and module load address information to be written onto the file name. If name is omitted, the map will be written to standard output. -W may be used to specify the desired width of the map.

The -U option allows the specification to the linker of a symbol which is to be initially entered into the symbol table as undefined. This is useful when loading entirely from libraries. More than one -U flag may be used.

If it is desired to use the debugger on the program being linked, it is useful to produce a symbol file. The -Dfile option will write such a symbol file onto the named file, or l.sym if no file is given. The symbol file consists of a list of addresses and symbols, one per line.

Examples

Here are some examples of using the linker. Note however that in the normal case it is not necessary to invoke the linker explicitly, since it is invoked automatically by the C command.

LINK -MMAP -C100H START.OBJ MAIN.OBJ A:LIBC.LIB


This command links the files start.obj and main.obj with the library a:libc.lib. Only those modules that are required from the library will be in fact linked in. The output is to be in .COM format, placed in the default file l.bin. A map is to be written to the file of the name map. Note that the file start.obj should contain startup code, and in fact the lowest address code in that file will be executed when the program is run, since it will be at 100H.

LINK -X -R -OX.OBJ FILE1.OBJ FILE2.OBJ A:LIBC.LIB


The files file1.obj and file2.obj will be linked with any necessary routines from a:libc.lib and left in the file x.obj. This file will remain relocatable. Undefined symbols will not cause an error. The file x.obj will probably later be the object of another link invocation. All local symbols will be stripped from the output file, thus saving space.

Invoking the Linker

The linker is called LINK, and normally resides on the A: drive, under CP/M, or in the directory A:\HITECH\ under MS-DOS. It may be invoked with no arguments, in which case it will prompt for input from standard input. If the standard input is a file, no prompts will be printed. The input supplied in this manner may contain lower case, whereas CP/M converts the entire command line to upper case by default. This is useful with the -U and -P options. This manner of invocation is generally useful if the number of arguments to LINK is large. Even if the list of files is too long to fit on one line, continuation lines may be included by leaving a backslash ('\') at the end of the preceding line. In this fashion, LINK commands of almost unlimited length may be issued.