From d283cb12f217d7a853671634fd4410333c423bbb Mon Sep 17 00:00:00 2001
From: Chris Lattner
@@ -52,9 +54,20 @@
classes in headers
The use of #include <iostream> in library files is -hereby forbidden. The primary reason for doing this is to -support clients using LLVM libraries as part of larger systems. In particular, -we statically link LLVM into some dynamic libraries. Even if LLVM isn't used, -the static c'tors are run whenever an application start up that uses the dynamic -library. There are two problems with this:
+When reading code, keep in mind how much state and how many previous +decisions have to be remembered by the reader to understand a block of code. +Aim to reduce indentation where possible when it doesn't make it more difficult +to understand the code. One great way to do this is by making use of early +exits and the 'continue' keyword in long loops. As an example of using an early +exit from a function, consider this "bad" code:
-+Value *DoSomething(Instruction *I) { + if (!isa<TerminatorInst>(I) && + I->hasOneUse() && SomeOtherThing(I)) { + ... some long code .... + } + + return 0; +} ++
Note that using the other stream headers (<sstream> for -example) is allowed normally, it is just <iostream> that is -causing problems.
+This code has several problems if the body of the 'if' is large. When you're +looking at the top of the function, it isn't immediately clear that this +only does interesting things with non-terminator instructions, and only +applies to things with the other predicates. Second, it is relatively difficult +to describe (in comments) why these predicates are important because the if +statement makes it difficult to lay out the comments. Third, when you're deep +within the body of the code, it is indented an extra level. Finally, when +reading the top of the function, it isn't clear what the result is if the +predicate isn't true, you have to read to the end of the function to know that +it returns null.
-The preferred replacement for stream functionality is the -llvm::raw_ostream class (for writing to output streams of various -sorts) and the llvm::MemoryBuffer API (for reading in files).
+It is much preferred to format the code like this:
+ ++Value *DoSomething(Instruction *I) { + // Terminators never need 'something' done to them because, ... + if (isa<TerminatorInst>(I)) + return 0; + + // We conservatively avoid transforming instructions with multiple uses + // because goats like cheese. + if (!I->hasOneUse()) + return 0; + + // This is really just here for example. + if (!SomeOtherThing(I)) + return 0; + + ... some long code .... +} ++
This fixes these problems. A similar problem frequently happens in for +loops. A silly example is something like this:
+ ++ for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ++II) { + if (BinaryOperator *BO = dyn_cast<BinaryOperator>(II)) { + Value *LHS = BO->getOperand(0); + Value *RHS = BO->getOperand(1); + if (LHS != RHS) { + ... + } + } + } ++
When you have very very small loops, this sort of structure is fine, but if +it exceeds more than 10-15 lines, it becomes difficult for people to read and +understand at a glance. +The problem with this sort of code is that it gets very nested very quickly, +meaning that the reader of the code has to keep a lot of context in their brain +to remember what is going immediately on in the loop, because they don't know +if/when the if conditions will have elses etc. It is strongly preferred to +structure the loop like this:
+ ++ for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ++II) { + BinaryOperator *BO = dyn_cast<BinaryOperator>(II); + if (!BO) continue; + + Value *LHS = BO->getOperand(0); + Value *RHS = BO->getOperand(1); + if (LHS == RHS) continue; + } ++
This has all the benefits of using early exits from functions: it reduces +nesting of the loop, it makes it easier to describe why the conditions are true, +and it makes it obvious to the reader that there is no "else" coming up that +they have to push context into their brain for. If a loop is large, this can +be a big understandability win.
+ +It is very common to write inline functions that just compute a boolean + value. There are a number of ways that people commonly write these, but an + example of this sort of thing is:
+ ++ bool FoundFoo = false; + for (unsigned i = 0, e = BarList.size(); i != e; ++i) + if (BarList[i]->isFoo()) { + FoundFoo = true; + break; + } + + if (FoundFoo) { + ... + } ++
This sort of code is awkward to write, and is almost always a bad sign. +Instead of this sort of loop, we strongly prefer to use a predicate function +(which may be static) that uses +early exits to compute the predicate. Code like +this would be preferred: +
+ + ++/// ListContainsFoo - Return true if the specified list has an element that is +/// a foo. +static bool ListContainsFoo(const std::vector<Bar*> &List) { + for (unsigned i = 0, e = List.size(); i != e; ++i) + if (List[i]->isFoo()) + return true; + return false; +} +... + + if (ListContainsFoo(BarList)) { + ... + } ++
There are many reasons for doing this: it reduces indentation and factors out +code which can often be shared by other code that checks for the same predicate. +More importantly, it forces you to pick a name for the function, and +forces you to write a comment for it. In this silly example, this doesn't add +much value. However, if the condition is complex, this can make it a lot easier +for the reader to understand the code that queries for this predicate. Instead +of being faced with the in-line details of we check to see if the BarList +contains a foo, we can trust the function name and continue reading with better +locality.
The use of #include <iostream> in library files is +hereby forbidden. The primary reason for doing this is to +support clients using LLVM libraries as part of larger systems. In particular, +we statically link LLVM into some dynamic libraries. Even if LLVM isn't used, +the static c'tors are run whenever an application start up that uses the dynamic +library. There are two problems with this:
+ +Note that using the other stream headers (<sstream> for +example) is allowed normally, it is just <iostream> that is +causing problems.
+ +The preferred replacement for stream functionality is the +llvm::raw_ostream class (for writing to output streams of various +sorts) and the llvm::MemoryBuffer API (for reading in files).
+ +The std::endl modifier, when used with iostreams outputs a newline +to the output stream specified. In addition to doing this, however, it also +flushes the output stream. In other words, these are equivalent:
+ ++std::cout << std::endl; +std::cout << '\n' << std::flush; ++
Most of the time, you probably have no reason to flush the output stream, so +it's better to use a literal '\n'.
+ +This section describes preferred low-level formatting guidelines along with +reasoning on why we prefer them.
+ + + + +We prefer to put a space before a parentheses only in control flow +statements, but not in normal function call expressions and function-like +macros. For example, this is good:
+ ++ if (x) ... + for (i = 0; i != 100; ++i) ... + while (llvm_rocks) ... + + somefunc(42); + assert(3 != 4 && "laws of math are failing me"); + + a = foo(42, 92) + bar(x); ++
... and this is bad:
+ ++ if(x) ... + for(i = 0; i != 100; ++i) ... + while(llvm_rocks) ... + + somefunc (42); + assert (3 != 4 && "laws of math are failing me"); + + a = foo (42, 92) + bar (x); ++
The reason for doing this is not completely arbitrary. This style makes + control flow operators stand out more, and makes expressions flow better. The + function call operator binds very tightly as a postfix operator. Putting + a space after a function name (as in the last example) makes it appear that + the code might bind the arguments of the left-hand-side of a binary operator + with the argument list of a function and the name of the right side. More + specifically, it is easy to misread the "a" example as:
+ ++ a = foo ((42, 92) + bar) (x); ++
... when skimming through the code. By avoiding a space in a function, we +avoid this misinterpretation.
+ +The std::endl modifier, when used with iostreams outputs a newline -to the output stream specified. In addition to doing this, however, it also -flushes the output stream. In other words, these are equivalent:
++In general, we strive to reduce indentation where ever possible. This is useful +because we want code to fit into 80 columns without +wrapping horribly, but also because it makes it easier to understand the code. +Namespaces are a funny thing: they are often large, and we often desire to put +lots of stuff into them (so they can be large). Other times they are tiny, +because they just hold an enum or something similar. In order to balance this, +we use different approaches for small versus large namespaces. +
+ ++If a namespace definition is small and easily fits on a screen (say, +less than 35 lines of code), then you should indent its body. Here's an +example: +
-std::cout << std::endl; -std::cout << '\n' << std::flush; +/// SomeCrazyThing - This namespace contains flags for ... +namespace SomeCrazyThing { + enum foo { + /// X - This is the X flag, which is ... + X = 1, + + /// Y - This is the Y flag, which is ... + Y = 2, + + /// Z - This is the Z flag, which is ... + Z = 4, + + /// ALL_FLAGS - This is the union of all flags. + ALL_FLAGS = 7 + }; +}
Most of the time, you probably have no reason to flush the output stream, so -it's better to use a literal '\n'.
+Since the body is small, indenting adds value because it makes it very clear +where the namespace starts and ends, and it is easy to take the whole thing in +in one "gulp" when reading the code. If the blob of code in the namespace is +larger (as it typically is in a header in the llvm or clang namespaces), do not +indent the code, and add a comment indicating what namespace is being closed. +For example:
+ ++namespace llvm { +namespace knowledge { + +/// Grokable - This class represents things that Smith can have an intimate +/// understanding of and contains the data associated with it. +class Grokable { +... +public: + explicit Grokable() { ... } + virtual ~Grokable() = 0; + + ... +}; + +} // end namespace knowledge +} // end namespace llvm ++
Because the class is large, we don't expect that the reader can easily +understand the entire concept in a glance, and the end of the file (where the +namespaces end) may be a long ways away from the place they open. As such, +indenting the contents of the namespace doesn't add any value, and detracts from +the readability of the class. In these cases it is best to not indent +the contents of the namespace.
+ +A common topic after talking about namespaces is anonymous namespaces. +Anonymous namespaces are a great language feature that tells the C++ compiler +that the contents of the namespace are only visible within the current +translation unit, allowing more aggressive optimization and eliminating the +possibility of symbol name collisions. Anonymous namespaces are to C++ as +"static" is to C functions and global variables. While "static" is available +in C++, anonymous namespaces are more general: they can make entire classes +private to a file.
+ +The problem with anonymous namespaces is that they naturally want to +encourage indentation of their body, and they reduce locality of reference: if +you see a random function definition in a C++ file, it is easy to see if it is +marked static, but seeing if it is in an anonymous namespace requires scanning +a big chunk of the file.
+ +Because of this, we have a simple guideline: make anonymous namespaces as +small as possible, and only use them for class declarations. For example, this +is good:
+ ++namespace { + class StringSort { + ... + public: + StringSort(...) + bool operator<(const char *RHS) const; + }; +} // end anonymous namespace + +static void Helper() { + ... +} + +bool StringSort::operator<(const char *RHS) const { + ... +} + ++
This is bad:
+ + ++namespace { +class StringSort { +... +public: + StringSort(...) + bool operator<(const char *RHS) const; +}; + +void Helper() { + ... +} + +bool StringSort::operator<(const char *RHS) const { + ... +} + +} // end anonymous namespace + ++
This is bad specifically because if you're looking at "Helper" in the middle +of a large C++ file, that you have no immediate way to tell if it is local to +the file. When it is marked static explicitly, this is immediately obvious. +Also, there is no reason to enclose the definition of "operator<" in the +namespace since it was declared there. +
+ +