Optimize frequently inlined FBString methods
Summary:
Almost every method of `fbstring` needs to perform category
dispatching. The category constants are `size_t`, which become 8-byte
immediate values in the dispatching code, so even a simple `if
(category() == Category::isSmall)` is quite large. When inlined
hundreds of thousands of time, it adds up.
This diff redefines the category type to be 1 byte (without changing
the ABI). It also optimizes `size()` and `c_str()` and makes them
branch-free, which probably is not going to have any perf impact but
it saves a few bytes.
Generated code for some small functions:
- `reset()`
Before:
```
48 ba 00 00 00 00 00 movabs $0x1700000000000000,%rdx
00 00 17
48 89 f8 mov %rdi,%rax
c6 07 00 movb $0x0,(%rdi)
48 89 57 10 mov %rdx,0x10(%rdi)
```
20 bytes
After:
```
48 89 f8 mov %rdi,%rax
c6 47 17 17 movb $0x17,0x17(%rdi)
c6 07 00 movb $0x0,(%rdi)
```
10 bytes
- `c_str()`
Before:
```
48 b8 00 00 00 00 00 movabs $0xc000000000000000,%rax
00 00 c0
48 85 47 10 test %rax,0x10(%rdi)
74 08 je 401fd8
48 8b 07 mov (%rdi),%rax
c3 retq
0f 1f 40 00 nopl 0x0(%rax)
48 89 f8 mov %rdi,%rax
```
26 bytes (without the `retq`)
After:
```
f6 47 17 c0 testb $0xc0,0x17(%rdi)
48 89 f8 mov %rdi,%rax
48 0f 45 07 cmovne (%rdi),%rax
```
11 bytes
- `size()`
Before:
```
48 b8 00 00 00 00 00 movabs $0xc000000000000000,%rax
00 00 c0
48 85 47 10 test %rax,0x10(%rdi)
74 08 je 401fa8
48 8b 47 08 mov 0x8(%rdi),%rax
c3 retq
0f 1f 00 nopl (%rax)
48 0f be 57 17 movsbq 0x17(%rdi),%rdx
b8 17 00 00 00 mov $0x17,%eax
48 29 d0 sub %rdx,%rax
```
36 bytes (without the `retq`)
After:
```
0f b6 57 17 movzbl 0x17(%rdi),%edx
b8 17 00 00 00 mov $0x17,%eax
48 29 d0 sub %rdx,%rax
48 0f 48 47 08 cmovs 0x8(%rdi),%rax
```
17 bytes
Reviewed By: philippv
Differential Revision:
D3957276
fbshipit-source-id:
ef40d82bbbb0456b1044421cd02133c268abe39b