X-Git-Url: http://demsky.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FSystemLibrary.html;h=0289a554108b6f8d26ac36d6c540c694702d563d;hb=a75ce9f5d2236d93c117e861e60e6f3f748c9555;hp=9858cdb40a1b81971928bd7768068029d7804ee9;hpb=7acb866f109cc04dc3f01e01b7deefd421005459;p=oota-llvm.git diff --git a/docs/SystemLibrary.html b/docs/SystemLibrary.html index 9858cdb40a1..0289a554108 100644 --- a/docs/SystemLibrary.html +++ b/docs/SystemLibrary.html @@ -8,173 +8,148 @@
System Library
- -
-

Warning: This document is a work in progress.

-
-
-

Written by Reid Spencer

+

Written by Reid Spencer

Abstract
-

This document describes the requirements, design, and implementation - details of LLVM's System Library. The library is composed of the header files - in llvm/include/llvm/System and the source files in - llvm/lib/System. The goal of this library is to completely shield - LLVM from the variations in operating system interfaces. By centralizing - LLVM's use of operating system interfaces, we make it possible for the LLVM - tool chain and runtime libraries to be more easily ported to new platforms - since (theoretically) only llvm/lib/System needs to be ported. This - library also unclutters the rest of LLVM from #ifdef use and special - cases for specific operating systems. Such uses are replaced with simple calls - to the interfaces provided in llvm/include/llvm/System.

Note that - lib/System is not intended to be a complete operating system wrapper (such as - the Adaptive Communications Environment (ACE) or Apache Portable Runtime - (APR)), but only to provide the functionality necessary to support LLVM. +

This document provides some details on LLVM's System Library, located in + the source at lib/System and include/llvm/System. The + library's purpose is to shield LLVM from the differences between operating + systems for the few services LLVM needs from the operating system. Much of + LLVM is written using portability features of standard C++. However, in a few + areas, system dependent facilities are needed and the System Library is the + wrapper around those system calls.

+

By centralizing LLVM's use of operating system interfaces, we make it + possible for the LLVM tool chain and runtime libraries to be more easily + ported to new platforms since (theoretically) only lib/System needs + to be ported. This library also unclutters the rest of LLVM from #ifdef use + and special cases for specific operating systems. Such uses are replaced + with simple calls to the interfaces provided in include/llvm/System. +

+

Note that the System Library is not intended to be a complete operating + system wrapper (such as the Adaptive Communications Environment (ACE) or + Apache Portable Runtime (APR)), but only provides the functionality necessary + to support LLVM.

The System Library was written by Reid Spencer who formulated the - design based on similar original work as part of the eXtensible Programming - System (XPS).

+ design based on similar work originating from the eXtensible Programming + System (XPS). Several people helped with the effort; especially, + Jeff Cohen and Henrik Bach on the Win32 port.

- System Library Requirements + Keeping LLVM Portable
-

The System library's requirements are aimed at shielding LLVM from the - variations in operating system interfaces. The following sections define the - requirements needed to fulfill this objective. Of necessity, these requirements - must be strictly followed in order to ensure the library's goal is reached.

+

In order to keep LLVM portable, LLVM developers should adhere to a set of + portability rules associated with the System Library. Adherence to these rules + should help the System Library achieve its goal of shielding LLVM from the + variations in operating system interfaces and doing so efficiently. The + following sections define the rules needed to fulfill this objective.

-
Hide System Header Files
+
Don't Include System Headers +
-

The library must sheild LLVM from all system libraries. To obtain - system level functionality, LLVM must #include "llvm/System/Thing.h" - and nothing else. This means that Thing.h cannot expose any system - header files. This protects LLVM from accidentally using system specific - functionality except through the lib/System interface. Specifically this - means that header files like "unistd.h", "windows.h", "stdio.h", and - "string.h" are verbotten outside the implementation of lib/System. -

+

Except in lib/System, no LLVM source code should directly + #include a system header. Care has been taken to remove all such + #includes from LLVM while lib/System was being + developed. Specifically this means that header files like "unistd.h", + "windows.h", "stdio.h", and "string.h" are forbidden to be included by LLVM + source code outside the implementation of lib/System.

+

To obtain system-dependent functionality, existing interfaces to the system + found in include/llvm/System should be used. If an appropriate + interface is not available, it should be added to include/llvm/System + and implemented in lib/System for all supported platforms.

-
Allow Standard C Headers +
Don't Expose System Headers
+
+

The System Library must shield LLVM from all system headers. To + obtain system level functionality, LLVM source must + #include "llvm/System/Thing.h" and nothing else. This means that + Thing.h cannot expose any system header files. This protects LLVM + from accidentally using system specific functionality and only allows it + via the lib/System interface.

+
+ + +
Use Standard C Headers

The standard C headers (the ones beginning with "c") are allowed - to be exposed through the lib/System interface. These headers and the things - they declare are considered to be platform agnostic. LLVM source files may - include them or obtain their inclusion through lib/System interfaces.

+ to be exposed through the lib/System interface. These headers and + the things they declare are considered to be platform agnostic. LLVM source + files may include them directly or obtain their inclusion through + lib/System interfaces.

-
Allow Standard C++ Headers +

The standard C++ headers from the standard C++ library and - standard template library are allowed to be exposed through the lib/System + standard template library may be exposed through the lib/System interface. These headers and the things they declare are considered to be platform agnostic. LLVM source files may include them or obtain their inclusion through lib/System interfaces.

- -
-

Any functions defined by system libraries (i.e. not defined by lib/System) - must not be exposed through the lib/System interface, even if the header file - for that function is not exposed. This prevents inadvertent use of system - specific functionality.

-

For example, the stat system call is notorious for having - variations in the data it provides. lib/System must not declare stat - nor allow it to be declared. Instead it should provide its own interface to - discovering information about files and directories. Those interfaces may be - implemented in terms of stat but that is strictly an implementation - detail.

-
- - - -
-

Any data defined by system libraries (i.e. not defined by lib/System) must - not be exposed through the lib/System interface, even if the header file for - that function is not exposed. As with functions, this prevents inadvertent use - of data that might not exist on all platforms.

-
- - - +
-

If an error occurs that lib/System cannot handle, the only action taken by - lib/System is to throw an instance of std:string. The contents of the string - must explain both what happened and the context in which it happened. The - format of the string should be a (possibly empty) list of contexts each - terminated with a : and a space, followed by the error message, optionally - followed by a reason, and optionally followed by a suggestion.

-

For example, failure to open a file named "foo" could result in a message - like:

-
  • foo: Unable to open file because it doesn't exist."
-

The "foo:" part is the context. The "Unable to open file" part is the error - message. The "because it doesn't exist." part is the reason. This message has - no suggestion. Where possible, the imlementation of lib/System should use - operating system specific facilities for converting the error code returned by - a system call into an error message. This will help to make the error message - more familiar to users of that type of operating system.

-

Note that this requirement precludes the throwing of any other exceptions. - For example, various C++ standard library functions can cause exceptions to be - thrown (e.g. out of memory situation). In all cases, if there is a possibility - that non-string exceptions could be thrown, the lib/System library must ensure - that the exceptions are translated to std::string form.

+

The entry points specified in the interface of lib/System must be aimed at + completing some reasonably high level task needed by LLVM. We do not want to + simply wrap each operating system call. It would be preferable to wrap several + operating system calls that are always used in conjunction with one another by + LLVM.

+

For example, consider what is needed to execute a program, wait for it to + complete, and return its result code. On Unix, this involves the following + operating system calls: getenv, fork, execve, and wait. The + correct thing for lib/System to provide is a function, say + ExecuteProgramAndWait, that implements the functionality completely. + what we don't want is wrappers for the operating system calls involved.

+

There must not be a one-to-one relationship between operating + system calls and the System library's interface. Any such interface function + will be suspicious.

- +
-

None of the lib/System interface functions may be declared with C++ - throw() specifications on them. This requirement makes sure that the - compler does not insert addtional exception handling code into the interface - functions. This is a performance consideration: lib/System functions are at - the bottom of the many call chains and as such can be frequently called. We - need them to be as efficient as possible.

+

There must be no functionality specified in the interface of lib/System + that isn't actually used by LLVM. We're not writing a general purpose + operating system wrapper here, just enough to satisfy LLVM's needs. And, LLVM + doesn't need much. This design goal aims to keep the lib/System interface + small and understandable which should foster its actual use and adoption.

@@ -184,66 +159,65 @@

The implementation of a function for a given platform must be written exactly once. This implies that it must be possible to apply a function's implementation to multiple operating systems if those operating systems can - share the same implementation.

+ share the same implementation. This rule applies to the set of operating + systems supported for a given class of operating system (e.g. Unix, Win32). +

- -
System Library Design
+ +
No Virtual Methods
-

In order to fulfill the requirements of the system library, strict design - objectives must be maintained in the library as it evolves. The goal here - is to provide interfaces to operating system concepts (files, memory maps, - sockets, signals, locking, etc) efficiently and in such a way that the - remainder of LLVM is completely operating system agnostic.

+

The System Library interfaces can be called quite frequently by LLVM. In + order to make those calls as efficient as possible, we discourage the use of + virtual methods. There is no need to use inheritance for implementation + differences, it just adds complexity. The #include mechanism works + just fine.

-
No Unused Functionality
+
No Exposed Functions
-

There must be no functionality specified in the interface of lib/System - that isn't actually used by LLVM. We're not writing a general purpose - operating system wrapper here, just enough to satisfy LLVM's needs. And, LLVM - doesn't need much. This design goal aims to keep the lib/System interface - small and understandable which should foster its actual use and adoption.

+

Any functions defined by system libraries (i.e. not defined by lib/System) + must not be exposed through the lib/System interface, even if the header file + for that function is not exposed. This prevents inadvertent use of system + specific functionality.

+

For example, the stat system call is notorious for having + variations in the data it provides. lib/System must not declare + stat nor allow it to be declared. Instead it should provide its own + interface to discovering information about files and directories. Those + interfaces may be implemented in terms of stat but that is strictly + an implementation detail. The interface provided by the System Library must + be implemented on all platforms (even those without stat).

-
High Level Interface
+
No Exposed Data
-

The entry points specified in the interface of lib/System must be aimed at - completing some reasonably high level task needed by LLVM. We do not want to - simply wrap each operating system call. It would be preferable to wrap several - operating system calls that are always used in conjunction with one another by - LLVM.

-

For example, consider what is needed to execute a program, wait for it to - complete, and return its result code. On Unix, this involves the following - operating system calls: getenv, fork, execve, and wait. The - correct thing for lib/System to provide is a function, say - ExecuteProgramAndWait, that implements the functionality completely. - what we don't want is wrappers for the operating system calls involved.

-

There must not be a one-to-one relationship between operating - system calls and the System library's interface. Any such interface function - will be suspicious.

+

Any data defined by system libraries (i.e. not defined by lib/System) must + not be exposed through the lib/System interface, even if the header file for + that function is not exposed. As with functions, this prevents inadvertent use + of data that might not exist on all platforms.

-
Minimize Soft Errors
+
Minimize Soft Errors
-

Operating system interfaces will generally provide errors results for every +

Operating system interfaces will generally provide error results for every little thing that could go wrong. In almost all cases, you can divide these error results into two groups: normal/good/soft and abnormal/bad/hard. That is, some of the errors are simply information like "file not found", "insufficient privileges", etc. while other errors are much harder like - "out of space", "bad disk sector", or "system call interrupted". Well call the - first group "soft" errors and the second group "hard" errors.

-

lib/System must always attempt to minimize soft errors and always just - throw a std::string on hard errors. This is a design requirement because the + "out of space", "bad disk sector", or "system call interrupted". We'll call + the first group "soft" errors and the second group "hard" + errors.

+

lib/System must always attempt to minimize soft errors. + This is a design requirement because the minimization of soft errors can affect the granularity and the nature of the interface. In general, if you find that you're wanting to throw soft errors, you must review the granularity of the interface because it is likely you're trying to implement something that is too low level. The rule of thumb is to - provide interface functions that "can't" fail, except when faced with hard - errors.

+ provide interface functions that can't fail, except when faced with + hard errors.

For a trivial example, suppose we wanted to add an "OpenFileForWriting" function. For many operating systems, if the file doesn't exist, attempting to open the file will produce an error. However, lib/System should not @@ -262,246 +236,83 @@

  • Handle internally the most common normal/good/soft error conditions so the rest of LLVM doesn't have to.
  • - -
    
    -Notes:
    -10. The implementation of a lib/System interface can vary drastically between
    -    platforms. That's okay as long as the end result of the interface function is
    -    the same. For example, a function to create a directory is pretty straight
    -    forward on all operating system. System V IPC on the other hand isn't even
    -    supported on all platforms. Instead of "supporting" System V IPC, lib/System
    -    should provide an interface to the basic concept of inter-process 
    -    communications. The implementations might use System V IPC if that was
    -    available or named pipes, or whatever gets the job done effectively for a
    -    given operating system.
    -
    -11. Implementations are separated first by the general class of operating system
    -    as provided by the configure script's $build variable. This variable is used
    -    to create a link from $BUILD_OBJ_ROOT/lib/System/platform to a directory in
    -    $BUILD_SRC_ROOT/lib/System directory with the same name as the $build
    -    variable. This provides a retargetable include mechanism. By using the link's
    -    name (platform) we can actually include the operating specific
    -    implementation. For example, support $build is "Darwin" for MacOS X. If we
    -    place:
    -      #include "platform/File.cpp"
    -    into a a file in lib/System, it will actually include
    -    lib/System/Darwin/File.cpp. What this does is quickly differentiate the basic
    -    class of operating system that will provide the implementation.
    - 
    -12. Implementation files in lib/System need may only do two things: (1) define 
    -    functions and data that is *TRULY* generic (completely platform agnostic) and
    -    (2) #include the platform specific implementation with:
    - 
    -       #include "platform/Impl.cpp"
    - 
    -    where Impl is the name of the implementation files.
    - 
    -13. Platform specific implementation files (platform/Impl.cpp) may only #include
    -    other Impl.cpp files found in directories under lib/System. The order of
    -    inclusion is very important (from most generic to most specific) so that we
    -    don't inadvertently place an implementation in the wrong place. For example,
    -    consider a fictitious implementation file named DoIt.cpp. Here's how the
    -    #includes should work for a Linux platform
    - 
    -    lib/System/DoIt.cpp
    -      #include "platform/DoIt.cpp"        // platform specific impl. of Doit
    -      DoIt
    - 
    -    lib/System/Linux/DoIt.cpp             // impl that works on all Linux 
    -      #include "../Unix/DoIt.cpp"         // generic Unix impl. of DoIt
    -      #include "../Unix/SUS/DoIt.cpp      // SUS specific impl. of DoIt
    -      #include "../Unix/SUS/v3/DoIt.cpp   // SUSv3 specific impl. of DoIt
    - 
    -    Note that the #includes in lib/System/Linux/DoIt.cpp are all optional but
    -    should be used where the implementation of some functionality can be shared
    -    across some set of Unix variants. We don't want to duplicate code across
    -    variants if their implementation could be shared.
    -
    -
    Use Opaque Classes
    -
    -

    no public data

    -

    onlyprimitive typed private/protected data

    -

    data size is "right" for platform, not max of all platforms

    -

    each class corresponds to O/S concept

    + - - -
    -

    To be written.

    +

    None of the lib/System interface functions may be declared with C++ + throw() specifications on them. This requirement makes sure that the + compiler does not insert additional exception handling code into the interface + functions. This is a performance consideration: lib/System functions are at + the bottom of many call chains and as such can be frequently called. We + need them to be as efficient as possible. However, no routines in the + system library should actually throw exceptions.

    - +
    -

    To be written.

    +

    Implementations of the System Library interface are separated by their + general class of operating system. Currently only Unix and Win32 classes are + defined but more could be added for other operating system classifications. + To distinguish which implementation to compile, the code in lib/System uses + the LLVM_ON_UNIX and LLVM_ON_WIN32 #defines provided via configure through the + llvm/Config/config.h file. Each source file in lib/System, after implementing + the generic (operating system independent) functionality needs to include the + correct implementation using a set of #if defined(LLVM_ON_XYZ) + directives. For example, if we had lib/System/File.cpp, we'd expect to see in + that file:

    +
    
    +  #if defined(LLVM_ON_UNIX)
    +  #include "Unix/File.cpp"
    +  #endif
    +  #if defined(LLVM_ON_WIN32)
    +  #include "Win32/File.cpp"
    +  #endif
    +  
    +

    The implementation in lib/System/Unix/File.cpp should handle all Unix + variants. The implementation in lib/System/Win32/File.cpp should handle all + Win32 variants. What this does is quickly differentiate the basic class of + operating system that will provide the implementation. The specific details + for a given platform must still be determined through the use of + #ifdef.

    - +
    -

    To be written.

    -
    - - - -
    -

    To be written.

    -
    - - - -
    -

    To be written.

    +

    The implementation of a lib/System interface can vary drastically between + platforms. That's okay as long as the end result of the interface function + is the same. For example, a function to create a directory is pretty straight + forward on all operating system. System V IPC on the other hand isn't even + supported on all platforms. Instead of "supporting" System V IPC, lib/System + should provide an interface to the basic concept of inter-process + communications. The implementations might use System V IPC if that was + available or named pipes, or whatever gets the job done effectively for a + given operating system. In all cases, the interface and the implementation + must be semantically consistent.

    -

    See bug 351 +

    See bug 351 for further details on the progress of this work

    - - -
    -

    In order to provide different implementations of the lib/System interface - for different platforms, it is necessary for the library to "sense" which - operating system is being compiled for and conditionally compile only the - applicabe parts of the library. While several operating system wrapper - libraries (e.g. APR, ACE) choose to use #ifdef preprocessor statements in - combination with autoconf variable (HAVE_* family), lib/System chooses an - alternate strategy.

    -

    To put it succinctly, the lib/System strategy has traded "#ifdef hell" for - "#include hell". That is, a given implementation file defines one or more - functions for a particular operating system variant. The functions defined in - that file have no #ifdef's to disambiguate the platform since the file is only - compiled on one kind of platform. While this leads to the same function being - imlemented differently in different files, it is our contention that this - leads to better maintenance and easier portability.

    -

    For example, consider a function having different implementations on a - variety of platforms. Many wrapper libraries choose to deal with the different - implementations by using #ifdef, like this:

    -
    
    -      void SomeFunction(void) {
    -      #if defined __LINUX
    -        // .. Linux implementation
    -      #elif defined __WIN32
    -        // .. Win32 implementation
    -      #elif defined __SunOS
    -        // .. SunOS implementation
    -      #else
    -      #warning "Don't know how to implement SomeFunction on this platform"
    -      #endif
    -      }
    -  
    -

    The problem with this is that its very messy to read, especially as the - number of operating systems and their variants grow. The above example is - actually tame compared to what can happen when the implementation depends on - specific flavors and versions of the operating system. In that case you end up - with multiple levels of nested #if statements. This is what we mean by "#ifdef - hell".

    -

    To avoid the situation above, we've choosen to locate all functions for a - given implementation file for a specific operating system into one place. This - has the following advantages:

    -

      -
    • No "#ifdef hell"
    • -
    • When porting, the strategy is quite straight forward: copy the - implementation file from a similar operating system to a new directory and - re-implement them.
    • -
    • Correctness is helped during porting because the new operating system's - implementation is wholly contained in a separate directory. There's no - chance to make an error in the #if statements and affect some other - operating system's implementation.
    • -
    -

    So, given that we have decided to use #include instead of #if to provide - platform specific implementations, there are actually three ways we can go - about doing this. None of them are perfect, but we believe we've chosen the - lesser of the three evils. Given that there is a variable named $OS which - names the platform for which we must build, here's a summary of the three - approaches we could use to determine the correct directory:

    -
      -
    1. Provide the compiler with a -I$(OS) on the command line. This could be - provided in only the lib/System makefile.
    2. -
    3. Use autoconf to transform #include statements in the implementation - files by using substitutions of @OS@. For example, if we had a file, - File.cpp.in, that contained "#include <@OS@/File.cpp>" this would get - transformed to "#include <actual/File.cpp>" where "actual" is the - actual name of the operating system
    4. -
    5. Create a link from $OBJ_DIR/platform to $SRC_DIR/$OS. This allows us to - use a generic directory name to get the correct platform, as in #include - <platform/File.cpp>
    6. -
    -

    Let's look at the pitfalls of each approach.

    -

    In approach #1, we end up with some confusion as to what gets included. - Suppose we have lib/System/File.cpp that includes just File.cpp to get the - platform specific part of the implementation. In this case, the include - directive with the <> syntax will include the right file but the include - directive with the "" syntax will recursively include the same file, - lib/System/File.cpp. In the case of #include <File.cpp>, the -I options - to the compiler are searched first so it works. But in the #include "File.cpp" - case, the current directory is searched first. Furthermore, in both cases, - neither include directive documents which File.cpp is getting included.

    -

    In approach #2, we have the problem of needing to reconfigure repeatedly. - Developer's generally hate that and we don't want lib/System to be a thorn in - everyone's side because it will constantly need updating as operating systems - change and as new operating systems are added. The problem occurs when a new - implementation file is added to the library. First of all, you have to add a - file with the .in suffix, then you have to add that file name to the list of - configurable files in the autoconf/configure.ac file, then you have to run - AutoRegen.sh to rebuild the configure script, then you have to run the - configure script. This is deemed to be a pretty large hassle.

    -

    In approach #3, we have the problem that not all platforms support links. - Fortunately the autoconf macro used to create the link can compensate for - this. If a link can't be made, the configure script will copy the correct - directory from $BUILD_SRC_DIR to $BUILD_OBJ_DIR under the new name. The only - problem with this is that if a copy is made, the copy doesn't get updated if - the programmer adds or modifies files in the $BUILD_SRC_DIR. A reconfigure or - manual copying is needed to get things to compile.

    -

    The approach we have taken in lib/System is #3. Here's why:

    -

      -
    • Approach #1 is rejected because it doesn't document what's actually - getting included and the potential for mistakes with alternate include - directive forms is high.
    • -
    • Approach #2 are both viable and only really impact development when new - files are added to the library.
    • -
    • However, approach #2 impacts every new file on every platform all the - time. With approach #3, only those platforms not supporting links will be - affected. The number of platforms not supporting links is very small and - they are generally archaic.
    • -
    • Given the above, approach #3 seems to have the least impact.
    • -
    -
    - - - -
    -

    The linux implementation of the system library will always be the - reference implementation. This means that (a) the concepts defined by the - linux must be identically replicated in the other implementations and (b) the - linux implementation must always be complete (provide implementations for all - concepts).

    -
    -
    Valid CSS! + src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"> Valid HTML 4.01! + src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"> Reid Spencer
    - LLVM Compiler Infrastructure
    + LLVM Compiler Infrastructure
    Last modified: $Date$