COMPILERS,
ASSEMBLERS, LINKERS and LOADER
- Normally the C’s program building
process involves four stages and utilizes different ‘tools’ such as a
preprocessor, compiler, assembler, and linker.
- At the end there should be a
single executable file. Below are the stages that happen in order
regardless of the operating system/compiler and graphically illustrated in
Figure w.1.
1.
Preprocessing is the first pass of any C compilation. It
processes include-files, conditional compilation instructions and macros.
2.
Compilation is the second pass. It takes the output of
the preprocessor, and the source code, and generates assembler source code.
3.
Assembly is the third stage of compilation. It takes
the assembly source code and produces an assembly listing with offsets. The
assembler output is stored in an object file.
4.
Linking is the final stage of compilation. It takes
one or more object files or libraries as input and combines them to produce a
single (usually executable) file. In doing so, it resolves references to
external symbols, assigns final addresses to procedures/functions and
variables, and revises code and data to reflect new addresses (a process called
relocation).
In Linking Libraries:
§ Program is linked with included header files.
§ Program is linked with other libraries.
§ This process is executed by Linker.
The
LINKER actually enables separate
compilation. As shown in below figure, an executable can be made up of a number
of source files which can be compiled and assembled into their object files
respectively, independently.
- The following Figure shows the
steps involved in the process of building the C program starting from the
compilation until the loading of the executable image into the memory for
program running.
Figure-
Compile, link and execute stages for running program (a process)
OBJECT FILES and EXECUTABLE
- After the
source code has been assembled, it will produce an Object files (e.g. .o,
.obj) and then linked, producing an executable files.
- An object
and executable come in several formats such as ELF (Executable and Linking
Format) and COFF (Common Object-File Format). For example, ELF is
used on Linux systems, while COFF is used on Windows systems.
PROCESS
LOADING
§ In Linux processes loaded from a file system
(using either the execve() or spawn() system calls) are in ELF format.
§ If the file system is on a block-oriented
device, the code and data are loaded into main memory.
§ If the file system is memory mapped (e.g.
ROM/Flash image), the code needn't be loaded into RAM, but may be executed in
place.
§ This approach makes all RAM available for data
and stack, leaving the code in ROM or Flash. In all cases, if the same process
is loaded more than once, its code will be shared.
§ Before we can run an executable, firstly we
have to load it into memory.
§ This is done by the loader, which is generally
part of the operating system. The loader does the following things (from other
things):
1.
Memory and access
validation - Firstly, the OS system kernel reads in the program file’s header
information and does the validation for type, access permissions, memory
requirement and its ability to run its instructions. It confirms that
file is an executable image and calculates memory requirements.
2. Process setup includes:
i.
Allocates primary
memory for the program's execution.
ii.
Copies address
space from secondary to primary memory.
iii.
Copies the .text
and .data sections from the executable into primary memory.
iv.
Copies program
arguments (e.g., command line arguments) onto the stack.
v.
Initializes
registers: sets the esp (stack pointer) to point to top of stack, clears the
rest.
vi.
Jumps to start
routine, which: copies main()'s arguments off
of the stack, and jumps to main().
- Address space is memory space
that contains program code, stack, and data segments or in other word, all
data the program uses as it runs.
- The memory layout, consists of
three segments (text, data, and stack), in simplified
form is shown in Figure w.5.
- The dynamic data segment is also
referred to as the heap, the place dynamically allocated memory
(such as from malloc() and new) comes from. Dynamically allocated memory is
memory allocated at run time instead of compile/link time.
- This organization enables any
division of the dynamically allocated memory between the heap (explicitly)
and the stack (implicitly). This explains why the stack grows downward and
heap grows upward.
No comments:
Post a Comment