Summary:
* folly/String.cpp (toLowerAscii): Fix two errors: the most important
would cause unaligned accesses. This would cause a performance loss
in general, but would also result in segfaults on ARM processes.
In addition, three conditionals were wrong, further limiting
the performance of this code: switch those "<" to "<=".
Test Plan:
Run this to exercise existing tests:
fbconfig folly/test:string_test && fbmake runtests_opt
Run this to generate timing stats (before and after this change), e.g.,
fbconfig folly/test:string_benchmark && fbmake opt
_bin/folly/test/string_benchmark > TIMING-toLower-old
These numbers show a 1.6% speed increase with this change:
--- TIMING-toLower-old 2014-07-14 16:51:12.
793523778 -0700
+++ TIMING-toLower-new 2014-07-14 16:49:45.
815119145 -0700
@@ -1,6 +1,6 @@
============================================================================
folly/test/StringBenchmark.cpp relative time/iter iters/s
============================================================================
-libc_tolower 1.06us 941.91K
-folly_toLowerAscii 89.99ns 11.11M
+libc_tolower 1.06us 941.90K
+folly_toLowerAscii 88.57ns 11.29M
============================================================================
Reviewed By: brianp@fb.com
Subscribers:
FB internal diff:
D1434585
Tasks:
4696800
Blame Revision:
D1421056
n = std::min(n, length);
size_t offset = 0;
if (n != 0) {
+ n = std::min(4 - n, length);
do {
toLowerAscii8(str[offset]);
offset++;
n = (size_t)(str + offset);
n &= kAlignMask64;
- if ((n != 0) && (offset + 4 < length)) {
+ if ((n != 0) && (offset + 4 <= length)) {
// The next address is 32-bit aligned but not 64-bit aligned.
// Convert the next 4 bytes in order to get to the 64-bit aligned
// part of the input.
}
// Convert 8 characters at a time
- while (offset + 8 < length) {
+ while (offset + 8 <= length) {
toLowerAscii64(*(uint64_t*)(str + offset));
offset += 8;
}
// Convert 4 characters at a time
- while (offset + 4 < length) {
+ while (offset + 4 <= length) {
toLowerAscii32(*(uint32_t*)(str + offset));
offset += 4;
}