Friday, 4 October 2013

About Compiler Assembler Linker and Loader in C


COMPILERS, ASSEMBLERS, LINKERS and LOADER

  • Normally the C’s program building process involves four stages and utilizes different ‘tools’ such as a preprocessor, compiler, assembler, and linker.
  • At the end there should be a single executable file.  Below are the stages that happen in order regardless of the operating system/compiler and graphically illustrated in Figure w.1.

1.       Preprocessing is the first pass of any C compilation. It processes include-files, conditional compilation instructions and macros.

2.       Compilation is the second pass. It takes the output of the preprocessor, and the source code, and generates assembler source code.

3.       Assembly is the third stage of compilation. It takes the assembly source code and produces an assembly listing with offsets. The assembler output is stored in an object file.

4.       Linking is the final stage of compilation. It takes one or more object files or libraries as input and combines them to produce a single (usually executable) file. In doing so, it resolves references to external symbols, assigns final addresses to procedures/functions and variables, and revises code and data to reflect new addresses (a process called relocation).

In Linking Libraries:

§  Program is linked with included header files.

§  Program is linked with other libraries.

§  This process is executed by Linker.

The LINKER actually enables separate compilation. As shown in below figure, an executable can be made up of a number of source files which can be compiled and assembled into their object files respectively, independently.

 
  • The following Figure shows the steps involved in the process of building the C program starting from the compilation until the loading of the executable image into the memory for program running.

Figure- Compile, link and execute stages for running program (a process)
OBJECT FILES and EXECUTABLE
  • After the source code has been assembled, it will produce an Object files (e.g. .o, .obj) and then linked, producing an executable files.
  • An object and executable come in several formats such as ELF (Executable and Linking Format) and COFF (Common Object-File Format).  For example, ELF is used on Linux systems, while COFF is used on Windows systems.
 
PROCESS LOADING
§  In Linux processes loaded from a file system (using either the execve() or spawn() system calls) are in ELF format.
§  If the file system is on a block-oriented device, the code and data are loaded into main memory.
§  If the file system is memory mapped (e.g. ROM/Flash image), the code needn't be loaded into RAM, but may be executed in place.
§  This approach makes all RAM available for data and stack, leaving the code in ROM or Flash. In all cases, if the same process is loaded more than once, its code will be shared.
§  Before we can run an executable, firstly we have to load it into memory.
§  This is done by the loader, which is generally part of the operating system. The loader does the following things (from other things):
1.       Memory and access validation - Firstly, the OS system kernel reads in the program file’s header information and does the validation for type, access permissions, memory requirement and its ability to run its instructions.  It confirms that file is an executable image and calculates memory requirements.
2.       Process setup includes:
                                                         i.            Allocates primary memory for the program's execution.
                                                       ii.            Copies address space from secondary to primary memory.
                                                      iii.            Copies the .text and .data sections from the executable into primary memory.
                                                     iv.            Copies program arguments (e.g., command line arguments) onto the stack.
                                                       v.            Initializes registers: sets the esp (stack pointer) to point to top of stack, clears the rest.
                                                     vi.            Jumps to start routine, which: copies main()'s arguments off of the stack, and jumps to main().
  • Address space is memory space that contains program code, stack, and data segments or in other word, all data the program uses as it runs.
  • The memory layout, consists of three segments (text, data, and stack), in simplified form is shown in Figure w.5.
  • The dynamic data segment is also referred to as the heap, the place dynamically allocated memory (such as from malloc() and new) comes from. Dynamically allocated memory is memory allocated at run time instead of compile/link time.
  • This organization enables any division of the dynamically allocated memory between the heap (explicitly) and the stack (implicitly). This explains why the stack grows downward and heap grows upward.

No comments:

Post a Comment