Запуск программ

Программный продукт представляет собой набор исполняемых компонентов.
Исполняемый компонент может предназначаться для выполнения непосредственно аппаратным обеспечением вычислительной системы или же некоторым интерпретатором.
- Здесь не рассматривается отдельно вопрос бинарной интерпретации и JIT, равно как программной эмуляции/виртуализации; важно лишь, исполняется ли нативный код или нет.
Выделяются следующие виды исполняемых компонентов:
- Программы (имеющие выделенную точку входа - начала исполнения)
- Библиотеки (не имеющие выделенной точки входа)
В POSIX-системах разделается создание процесса (совершаемое посредством вызова fork()) и запуск файла на выполнение (семейство вызовов exec*()), совершаемое в рамках существующего процесса.
Рассмотрим программу на языке С, исходный текст которой состоит из двух файлов, и один из файлов вызывает функцию из другого файла, после чего делает вызов библиотечной функции printf и выходит.
В Linux запуск файла на выполнение реализован посредством системного вызова execve().
При выполнении данного системного вызова новая программа загружается в память и ей передаётся выполнение. При этом контекст процесса сохраняется, за исклчением того, что практически все его атрибуты сбрасываются:
- "All process attributes are preserved during an execve(), except the following:
  - The dispositions of any signals that are being caught are reset to the default (signal(7)).
  - Any alternate signal stack is not preserved (sigaltstack(2)).
  - Memory mappings are not preserved (mmap(2)).
  - Attached System V shared memory segments are detached (shmat(2)).
  - POSIX shared memory regions are unmapped (shm_open(3)).
  - Open POSIX message queue descriptors are closed (mq_overview(7)).
  - Any open POSIX named semaphores are closed (sem_overview(7)).
  - POSIX timers are not preserved (timer_create(2)).
  - Any open directory streams are closed (opendir(3)).
  - Memory locks are not preserved (mlock(2), mlockall(2)).
  - Exit handlers are not preserved (atexit(3), on_exit(3)).
  - The floating-point environment is reset to the default (see fenv(3)).
- The process attributes in the preceding list are all specified in POSIX.1-2001. The following Linux-specific process attributes are also not preserved during an execve():
  - The prctl(2) PR_SET_DUMPABLE flag is set, unless a set-user-ID or set-group ID program is being executed, in which case it is cleared.
  - The prctl(2) PR_SET_KEEPCAPS flag is cleared.
  - (Since Linux 2.4.36 / 2.6.23) If a set-user-ID or set-group-ID program is being executed, then the parent death signal set by prctl(2) PR_SET_PDEATHSIG flag is cleared.
  - The process name, as set by prctl(2) PR_SET_NAME (and displayed by ps -o comm), is reset to the name of the new executable file.
  - The SECBIT_KEEP_CAPS securebits flag is cleared. See capabilities(7).
  - The termination signal is reset to SIGCHLD (see clone(2)).
- Note the following further points:
  - All threads other than the calling thread are destroyed during an execve(). Mutexes, condition variables, and other pthreads objects are not preserved.
  - The equivalent of setlocale(LC_ALL, "C") is executed at program start-up.
  - POSIX.1-2001 specifies that the dispositions of any signals that are ignored or set to the default are left unchanged. POSIX.1-2001 specifies one exception: if SIGCHLD is being ignored, then an implementation may leave the disposition unchanged or reset it to the default; Linux does the former.
  - Any outstanding asynchronous I/O operations are canceled (aio_read(3), aio_write(3)).
  - For the handling of capabilities during execve(), see capabilities(7)."
- Открытыми остаются разве что файловые дескрипторы (за исключением тех, у которых стоит FD_CLOEXEC), uid/gid (кроме случаев, когда файл имел установленными биты setuid/setgid), capability, security context (на последние два аспект сильно влияют LSM).
Как загружается и начинает выполняться файл
- Для поддержки различных форматов исполняемых файлов в Linux код, отвечающий за такую поддержку, оформляется в виде модулей (format module)
  - Среди них имеются как честные модули поддержки различных форматов (a.out, ELF, BFLT), так и wrapper-ы (среди которых можно отдельно выделить binfmt_script и binfmt_misc)
Зачем вообще нужны бинарные форматы
- Из dsohowto: "The main aspect is the binary format. This is the format which is used to describe the application code. Long gone are the days that it was sufficient to provide a memory dump. Multi-process systems need to identify different parts of the file containing the program such as the text, data, and debug information parts. For this, binary formats were introduced early on. Commonly used in the early Unix-days were formats such as a.out or COFF. These binary formats were not designed with shared libraries in mind and this clearly shows."
- Необходимость выполнять дополнительную работу в основном следует из желания использовать разделяемые библиотеки. При выхове функции нужно осуществить переход по её адресу.
- Linkers and loaders, ch. 1: "loaders perform several related but conceptually separate actions.
  - Program loading: Copy a program from secondary storage (which since about 1968 invariably means a disk) into main memory so it's ready to run. In some cases loading just involves copying the data from disk to memory, in others it involves allocating storage, setting protection bits, or arranging for virtual memory to map virtual addresses to disk pages.
  - Relocation: Compilers and assemblers generally create each file of object code with the program addresses starting at zero, but few computers let you load your program at location zero. If a program is created from multiple subprograms, all the subprograms have to be loaded at non-overlapping addresses. Relocation is the process of assigning load addresses to the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses. In many systems, relocation happens more than once. It's quite common for a linker to create a program from multiple subprograms, and create one linked output program that starts at zero, with the various subprograms relocated to locations within the big program. Then when the program is loaded, the system picks the actual load address and the linked program is relocated as a whole to the load address.
  - Symbol resolution: When a program is built from multiple subprograms, the references from one subprogram to another are made using symbols; a main program might use a square root routine called sqrt, and the math library defines sqrt. A linker resolves the symbol by noting the location assigned to sqrt in the library, and patching the caller's object code to so the call instruction refers to that location."

Материалы

eSyr/BuildSystemsCoursePlan/010Exec (последним исправлял пользователь eSyr 2015-02-13 17:33:44)