From d283cb12f217d7a853671634fd4410333c423bbb Mon Sep 17 00:00:00 2001 From: Chris Lattner Date: Wed, 22 Jul 2009 05:40:54 +0000 Subject: [PATCH] add some more topics to the coding standards doc: * Use Early Exits and 'continue' to Simplify Code * Turn Predicate Loops into Predicate Functions * Spaces Before Parentheses * Namespace Indentation * Anonymous Namespaces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76723 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/CodingStandards.html | 489 +++++++++++++++++++++++++++++++++++--- 1 file changed, 456 insertions(+), 33 deletions(-) diff --git a/docs/CodingStandards.html b/docs/CodingStandards.html index 79c816ef65f..663be014d21 100644 --- a/docs/CodingStandards.html +++ b/docs/CodingStandards.html @@ -41,8 +41,10 @@
  • #include as Little as Possible
  • Keep "internal" Headers Private
  • -
  • #include <iostream> is - forbidden
  • +
  • Use Early Exits and 'continue' to Simplify + Code
  • +
  • Turn Predicate Loops into Predicate + Functions
  • The Low Level Issues
      @@ -52,9 +54,20 @@ classes in headers
    1. Don't evaluate end() every time through a loop
    2. -
    3. Prefer Preincrement
    4. +
    5. #include <iostream> is + forbidden
    6. Avoid std::endl
  • + +
  • Microscopic Details +
      +
    1. Spaces Before Parentheses
    2. +
    3. Prefer Preincrement
    4. +
    5. Namespace Indentation
    6. +
    7. Anonymous Namespaces
    8. +
  • + +
  • See Also
  • @@ -419,6 +432,7 @@ declare the symbol. This can lead to problems at link time.

    The High Level Issues
    + @@ -504,34 +518,174 @@ class itself... just make them private (or protected), and all is well.

    - #include <iostream> is forbidden + Use Early Exits and 'continue' to Simplify Code
    -

    The use of #include <iostream> in library files is -hereby forbidden. The primary reason for doing this is to -support clients using LLVM libraries as part of larger systems. In particular, -we statically link LLVM into some dynamic libraries. Even if LLVM isn't used, -the static c'tors are run whenever an application start up that uses the dynamic -library. There are two problems with this:

    +

    When reading code, keep in mind how much state and how many previous +decisions have to be remembered by the reader to understand a block of code. +Aim to reduce indentation where possible when it doesn't make it more difficult +to understand the code. One great way to do this is by making use of early +exits and the 'continue' keyword in long loops. As an example of using an early +exit from a function, consider this "bad" code:

    -
      -
    1. The time to run the static c'tors impacts startup time of - applications—a critical time for GUI apps.
    2. -
    3. The static c'tors cause the app to pull many extra pages of memory off the - disk: both the code for the static c'tors in each .o file and the - small amount of data that gets touched. In addition, touched/dirty pages - put more pressure on the VM system on low-memory machines.
    4. -
    +
    +
    +Value *DoSomething(Instruction *I) {
    +  if (!isa<TerminatorInst>(I) &&
    +      I->hasOneUse() && SomeOtherThing(I)) {
    +    ... some long code ....
    +  }
    +  
    +  return 0;
    +}
    +
    +
    -

    Note that using the other stream headers (<sstream> for -example) is allowed normally, it is just <iostream> that is -causing problems.

    +

    This code has several problems if the body of the 'if' is large. When you're +looking at the top of the function, it isn't immediately clear that this +only does interesting things with non-terminator instructions, and only +applies to things with the other predicates. Second, it is relatively difficult +to describe (in comments) why these predicates are important because the if +statement makes it difficult to lay out the comments. Third, when you're deep +within the body of the code, it is indented an extra level. Finally, when +reading the top of the function, it isn't clear what the result is if the +predicate isn't true, you have to read to the end of the function to know that +it returns null.

    -

    The preferred replacement for stream functionality is the -llvm::raw_ostream class (for writing to output streams of various -sorts) and the llvm::MemoryBuffer API (for reading in files).

    +

    It is much preferred to format the code like this:

    + +
    +
    +Value *DoSomething(Instruction *I) {
    +  // Terminators never need 'something' done to them because, ... 
    +  if (isa<TerminatorInst>(I))
    +    return 0;
    +
    +  // We conservatively avoid transforming instructions with multiple uses
    +  // because goats like cheese.
    +  if (!I->hasOneUse())
    +    return 0;
    +
    +  // This is really just here for example.
    +  if (!SomeOtherThing(I))
    +    return 0;
    +    
    +  ... some long code ....
    +}
    +
    +
    + +

    This fixes these problems. A similar problem frequently happens in for +loops. A silly example is something like this:

    + +
    +
    +  for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ++II) {
    +    if (BinaryOperator *BO = dyn_cast<BinaryOperator>(II)) {
    +      Value *LHS = BO->getOperand(0);
    +      Value *RHS = BO->getOperand(1);
    +      if (LHS != RHS) {
    +        ...
    +      }
    +    }
    +  }
    +
    +
    + +

    When you have very very small loops, this sort of structure is fine, but if +it exceeds more than 10-15 lines, it becomes difficult for people to read and +understand at a glance. +The problem with this sort of code is that it gets very nested very quickly, +meaning that the reader of the code has to keep a lot of context in their brain +to remember what is going immediately on in the loop, because they don't know +if/when the if conditions will have elses etc. It is strongly preferred to +structure the loop like this:

    + +
    +
    +  for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ++II) {
    +    BinaryOperator *BO = dyn_cast<BinaryOperator>(II);
    +    if (!BO) continue;
    +    
    +    Value *LHS = BO->getOperand(0);
    +    Value *RHS = BO->getOperand(1);
    +    if (LHS == RHS) continue;
    +  }
    +
    +
    + +

    This has all the benefits of using early exits from functions: it reduces +nesting of the loop, it makes it easier to describe why the conditions are true, +and it makes it obvious to the reader that there is no "else" coming up that +they have to push context into their brain for. If a loop is large, this can +be a big understandability win.

    + +
    + + + +
    + Turn Predicate Loops into Predicate Functions +
    + +
    + +

    It is very common to write inline functions that just compute a boolean + value. There are a number of ways that people commonly write these, but an + example of this sort of thing is:

    + +
    +
    +  bool FoundFoo = false;
    +  for (unsigned i = 0, e = BarList.size(); i != e; ++i)
    +    if (BarList[i]->isFoo()) {
    +      FoundFoo = true;
    +      break;
    +    }
    +    
    +  if (FoundFoo) {
    +    ...
    +  }
    +
    +
    + +

    This sort of code is awkward to write, and is almost always a bad sign. +Instead of this sort of loop, we strongly prefer to use a predicate function +(which may be static) that uses +early exits to compute the predicate. Code like +this would be preferred: +

    + + +
    +
    +/// ListContainsFoo - Return true if the specified list has an element that is
    +/// a foo.
    +static bool ListContainsFoo(const std::vector<Bar*> &List) {
    +  for (unsigned i = 0, e = List.size(); i != e; ++i)
    +    if (List[i]->isFoo())
    +      return true;
    +  return false;
    +}
    +...
    +
    +  if (ListContainsFoo(BarList)) {
    +    ...
    +  }
    +
    +
    + +

    There are many reasons for doing this: it reduces indentation and factors out +code which can often be shared by other code that checks for the same predicate. +More importantly, it forces you to pick a name for the function, and +forces you to write a comment for it. In this silly example, this doesn't add +much value. However, if the condition is complex, this can make it a lot easier +for the reader to understand the code that queries for this predicate. Instead +of being faced with the in-line details of we check to see if the BarList +contains a foo, we can trust the function name and continue reading with better +locality.

    @@ -540,6 +694,7 @@ sorts) and the llvm::MemoryBuffer API (for reading in files).

    The Low Level Issues
    + @@ -726,10 +881,134 @@ prefer it.

    + +
    + #include <iostream> is forbidden +
    + +
    + +

    The use of #include <iostream> in library files is +hereby forbidden. The primary reason for doing this is to +support clients using LLVM libraries as part of larger systems. In particular, +we statically link LLVM into some dynamic libraries. Even if LLVM isn't used, +the static c'tors are run whenever an application start up that uses the dynamic +library. There are two problems with this:

    + +
      +
    1. The time to run the static c'tors impacts startup time of + applications—a critical time for GUI apps.
    2. +
    3. The static c'tors cause the app to pull many extra pages of memory off the + disk: both the code for the static c'tors in each .o file and the + small amount of data that gets touched. In addition, touched/dirty pages + put more pressure on the VM system on low-memory machines.
    4. +
    + +

    Note that using the other stream headers (<sstream> for +example) is allowed normally, it is just <iostream> that is +causing problems.

    + +

    The preferred replacement for stream functionality is the +llvm::raw_ostream class (for writing to output streams of various +sorts) and the llvm::MemoryBuffer API (for reading in files).

    + +
    + + + +
    + Avoid std::endl +
    + +
    + +

    The std::endl modifier, when used with iostreams outputs a newline +to the output stream specified. In addition to doing this, however, it also +flushes the output stream. In other words, these are equivalent:

    + +
    +
    +std::cout << std::endl;
    +std::cout << '\n' << std::flush;
    +
    +
    + +

    Most of the time, you probably have no reason to flush the output stream, so +it's better to use a literal '\n'.

    + +
    + + + +
    + Microscopic Details +
    + + +

    This section describes preferred low-level formatting guidelines along with +reasoning on why we prefer them.

    + + +
    + Spaces Before Parentheses +
    + +
    + +

    We prefer to put a space before a parentheses only in control flow +statements, but not in normal function call expressions and function-like +macros. For example, this is good:

    + +
    +
    +  if (x) ...
    +  for (i = 0; i != 100; ++i) ...
    +  while (llvm_rocks) ...
    +
    +  somefunc(42);
    +  assert(3 != 4 && "laws of math are failing me");
    +  
    +  a = foo(42, 92) + bar(x);
    +  
    +
    + +

    ... and this is bad:

    + +
    +
    +  if(x) ...
    +  for(i = 0; i != 100; ++i) ...
    +  while(llvm_rocks) ...
    +
    +  somefunc (42);
    +  assert (3 != 4 && "laws of math are failing me");
    +  
    +  a = foo (42, 92) + bar (x);
    +
    +
    + +

    The reason for doing this is not completely arbitrary. This style makes + control flow operators stand out more, and makes expressions flow better. The + function call operator binds very tightly as a postfix operator. Putting + a space after a function name (as in the last example) makes it appear that + the code might bind the arguments of the left-hand-side of a binary operator + with the argument list of a function and the name of the right side. More + specifically, it is easy to misread the "a" example as:

    + +
    +
    +  a = foo ((42, 92) + bar) (x);
    +
    +
    + +

    ... when skimming through the code. By avoiding a space in a function, we +avoid this misinterpretation.

    + +
    - Prefer Preincrement + Prefer Preincrement
    @@ -749,27 +1028,171 @@ get in the habit of always using preincrement, and you won't have a problem.

    -

    The std::endl modifier, when used with iostreams outputs a newline -to the output stream specified. In addition to doing this, however, it also -flushes the output stream. In other words, these are equivalent:

    +

    +In general, we strive to reduce indentation where ever possible. This is useful +because we want code to fit into 80 columns without +wrapping horribly, but also because it makes it easier to understand the code. +Namespaces are a funny thing: they are often large, and we often desire to put +lots of stuff into them (so they can be large). Other times they are tiny, +because they just hold an enum or something similar. In order to balance this, +we use different approaches for small versus large namespaces. +

    + +

    +If a namespace definition is small and easily fits on a screen (say, +less than 35 lines of code), then you should indent its body. Here's an +example: +

    -std::cout << std::endl;
    -std::cout << '\n' << std::flush;
    +/// SomeCrazyThing - This namespace contains flags for ...
    +namespace SomeCrazyThing {
    +  enum foo {
    +    /// X - This is the X flag, which is ...
    +    X = 1, 
    +    
    +    /// Y - This is the Y flag, which is ...
    +    Y = 2,
    +    
    +    /// Z - This is the Z flag, which is ...
    +    Z = 4,
    +    
    +    /// ALL_FLAGS - This is the union of all flags.
    +    ALL_FLAGS = 7
    +  };
    +}
     
    -

    Most of the time, you probably have no reason to flush the output stream, so -it's better to use a literal '\n'.

    +

    Since the body is small, indenting adds value because it makes it very clear +where the namespace starts and ends, and it is easy to take the whole thing in +in one "gulp" when reading the code. If the blob of code in the namespace is +larger (as it typically is in a header in the llvm or clang namespaces), do not +indent the code, and add a comment indicating what namespace is being closed. +For example:

    + +
    +
    +namespace llvm {
    +namespace knowledge {
    +
    +/// Grokable - This class represents things that Smith can have an intimate
    +/// understanding of and contains the data associated with it.
    +class Grokable {
    +...
    +public:
    +  explicit Grokable() { ... }
    +  virtual ~Grokable() = 0;
    +  
    +  ...
     
    +};
    +
    +} // end namespace knowledge
    +} // end namespace llvm
    +
    +
    + +

    Because the class is large, we don't expect that the reader can easily +understand the entire concept in a glance, and the end of the file (where the +namespaces end) may be a long ways away from the place they open. As such, +indenting the contents of the namespace doesn't add any value, and detracts from +the readability of the class. In these cases it is best to not indent +the contents of the namespace.

    + +
    + + + +
    + +

    A common topic after talking about namespaces is anonymous namespaces. +Anonymous namespaces are a great language feature that tells the C++ compiler +that the contents of the namespace are only visible within the current +translation unit, allowing more aggressive optimization and eliminating the +possibility of symbol name collisions. Anonymous namespaces are to C++ as +"static" is to C functions and global variables. While "static" is available +in C++, anonymous namespaces are more general: they can make entire classes +private to a file.

    + +

    The problem with anonymous namespaces is that they naturally want to +encourage indentation of their body, and they reduce locality of reference: if +you see a random function definition in a C++ file, it is easy to see if it is +marked static, but seeing if it is in an anonymous namespace requires scanning +a big chunk of the file.

    + +

    Because of this, we have a simple guideline: make anonymous namespaces as +small as possible, and only use them for class declarations. For example, this +is good:

    + +
    +
    +namespace {
    +  class StringSort {
    +  ...
    +  public:
    +    StringSort(...)
    +    bool operator<(const char *RHS) const;
    +  };
    +} // end anonymous namespace
    +
    +static void Helper() { 
    +  ... 
    +}
    +
    +bool StringSort::operator<(const char *RHS) const {
    +  ...
    +}
    +
    +
    +
    + +

    This is bad:

    + + +
    +
    +namespace {
    +class StringSort {
    +...
    +public:
    +  StringSort(...)
    +  bool operator<(const char *RHS) const;
    +};
    +
    +void Helper() { 
    +  ... 
    +}
    +
    +bool StringSort::operator<(const char *RHS) const {
    +  ...
    +}
    +
    +} // end anonymous namespace
    +
    +
    +
    + + +

    This is bad specifically because if you're looking at "Helper" in the middle +of a large C++ file, that you have no immediate way to tell if it is local to +the file. When it is marked static explicitly, this is immediately obvious. +Also, there is no reason to enclose the definition of "operator<" in the +namespace since it was declared there. +

    + +
    + +
    -- 2.34.1