Programming Languages Research Group: Git

[DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking

When AA is being used, non-aliasing stores are canonicalized to use the same
chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of
this by looking only as users of a store's chain operand. However, user
iteration is not result-number specific, we need to check that the use is as a
chain operand, and not via some other operand. It is certainly possible to have
another potentially-aliasing store, which shares the first's base pointer, and
uses the first's chain's node via some other operand.

Failure to catch this situation caused, at least in the included test case, an
assert later because the relative sequence-number ordering caused later
replacement to create a cycle in the DAG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248698 91177308-0d34-0410-b5e6-96231b3b80d8

Remove 'const' from some ArrayRefs. ArrayRefs are already immutable. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248693 91177308-0d34-0410-b5e6-96231b3b80d8

AsmWriter: Print the argument names in declarations while debugging

When llvm declarations have argument names, it's helpful to actually
print those names when debugging. Arguably, it'd be nice to print them
all the time, but that would mean the IR we output wouldn't round trip
through bitcode, which doesn't store the names.

Make the varous print() methods in AsmWriter optionally print "for
debug" and set that flag in the dump() methods. The only thing this
does differently for now is print the argument names in declarations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248692 91177308-0d34-0410-b5e6-96231b3b80d8

Silence clang warning: variable ‘Status’ set but not used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248691 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] identical instructions don't compute equal values

Before this change `HasSameValue` would return true for distinct
`alloca` instructions if they happened to be allocating the same
type (`alloca` instructions are not specified as reading memory). This
change adds an explicit whitelist of instruction types for which
"identical" instructions compute the same value.

Fixes PR24952.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248690 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] fold zexts and constants into a phi (PR24766)

This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766

We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:

#include <stdbool.h>

bool switchy(char x1, char x2, char condition) {
 bool conditionMet = false;
 switch (condition) {
 case 0: conditionMet = (x1 == x2); break;
 case 1: conditionMet = (x1 <= x2); break;
 }
 return conditionMet;
}

bool switchy2(char x1, char x2, char condition) {
 switch (condition) {
 case 0: return (x1 == x2);
 case 1: return (x1 <= x2);
 }
 return false;
}

As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it,
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.

Differential Revision: http://reviews.llvm.org/D12866

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248689 91177308-0d34-0410-b5e6-96231b3b80d8

[EH] Create removeUnwindEdge utility

Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.

Reviewers: andrew.w.kaylor, majnemer, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13152

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248677 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Removed unnecessary meta attributes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248672 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mc-fuzzer] Fix -jobs option.

The fuzzer argument parser will ignore all options starting with '--' so
operation mode options should begin with '--' and fuzzer options should begin
with '-'. Fuzzer arguments must still follow --fuzzer-args so that they escape
the parsing performed by the CommandLine library.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248671 91177308-0d34-0410-b5e6-96231b3b80d8

[BranchProbability] Manually round the floating point output.

llvm::format compiles down to snprintf which has no defined rounding for
floating point arguments, and MSVC has implemented it differently from
what the BSD libcs and glibc do. Try to emulate the glibc rounding
behavior to avoid changing tests.

While there simplify code a bit and move trivial methods inline.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248665 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove hasPostISelHook from most instructions

Since this is only needed for VOP3 and a few other special
case instructions, stop setting it on everything.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248657 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Switch over reg class size instead of checking all super classes

This gets isSGPRClass out of my profile of SIFixSGPRCopies.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248656 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Don't handle invalid reg classes in helper functions

No tests hit these and it would be better to have checks like
this explicit where they are used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248655 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: address -Winconsistent-missing-override

Add missing override. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248652 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Set CopyCost of register classes

These require multiple mov instructions to copy,
but the default value is that 1 instruction is needed.
I'm not sure if this actually changes anything.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248651 91177308-0d34-0410-b5e6-96231b3b80d8

[Bug 24848] Use range metadata to constant fold comparisons between two values

Summary:
This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.

If both operands of a comparison have range metadata, they should be used to constant fold the comparison.

Reviewers: sanjoy, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13177

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248650 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: VOP3b definition cleanups

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248647 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix sched model for VOP2b instructions

Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248646 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Rename several functions and types according to the new spec.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248644 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Don't generate clrex for pre-v7 targets.

Since r248294, we emit clrex, but it doesn't exist on v6.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248640 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts'

Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.

Depends on D12948
Depends on D12949

The original checkin, r248608 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.

Reviewers: atrick, reames, majnemer, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12950

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248638 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Reapply 'Exploit A (A+K) < (B+K) when possible'

Summary:

This change teaches SCEV's `isImpliedCond` two new identities:

 A u (A + C) u< (B + C)
 A s (A + C) s< (B + C)

While these are useful on their own, they're really intended to support
D12950.

The original checkin, r248606 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.

Reviewers: atrick, reames, majnemer, nlewycky, hfinkel

Subscribers: aadg, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12948

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248637 91177308-0d34-0410-b5e6-96231b3b80d8

LivePhysRegs: Fix live-outs of return blocks

I realized that the live-out set computed for the return block is
missing the callee saved registers (the non-pristine ones to be exact).

This only affects the liveness computed for instructions inside the
function epilogue which currently none of the LivePhysRegs users in llvm
cares about, so this is just a drive-by fix without a testcase.

Differential Revision: http://reviews.llvm.org/D13180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248636 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] match De Morgan's Law hidden by zext ops (PR22723)

This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723

My first attempt at this was to change what I thought was the root problem:

xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32

...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!

My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I
abandoned that idea.

I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):

bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb %sil, %dil
xorb $1, %dil
movb %dil, %al
retq

Differential Revision: http://reviews.llvm.org/D12705

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248634 91177308-0d34-0410-b5e6-96231b3b80d8

Use fixed-point representation for BranchProbability.

BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:

Pros:

1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.

Cons:

1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.

One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.

Differential revision: http://reviews.llvm.org/D12603

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248633 91177308-0d34-0410-b5e6-96231b3b80d8

SelectionDAGDumper: Print simple operands inline.

Print simple operands inline instead of their pointer/value number.
Simple operands are SDNodes without predecessors like Constant(FP), Register,
UNDEF. This unifies the behaviour with dumpr() which was already doing this.

Previously:
 t0: ch = EntryToken
 t1: i64 = Register %vreg0
 t2: i64,ch = CopyFromReg t0, t1
 t3: i64 = Constant<1>
 t4: i64 = add t2, t3
 t5: i64 = Constant<2>
 t6: i64 = add t2, t5
 t10: i64 = undef
 t11: i8,ch = load t0, t2, t10<LD1[%tmp81]>
 t12: i8,ch = load t0, t4, t10<LD1[%tmp10]>
 t13: i8,ch = load t0, t6, t10<LD1[%tmp12]>

Now:
 t0: ch = EntryToken
 t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
 t4: i64 = add t2, Constant:i64<1>
 t6: i64 = add t2, Constant:i64<2>
 t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64
 t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64
 t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64

Differential Revision: http://reviews.llvm.org/D12567

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248628 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Construct new buffer instruction when moving SMRD

It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248627 91177308-0d34-0410-b5e6-96231b3b80d8

DAGCombiner: Check if store is volatile first

This is the simpler check. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248625 91177308-0d34-0410-b5e6-96231b3b80d8

TargetRegisterInfo: Introduce PrintLaneMask.

This makes it more convenient to print lane masks and lead to more
uniform printing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248624 91177308-0d34-0410-b5e6-96231b3b80d8

TargetRegisterInfo: Add typedef unsigned LaneBitmask and use it where apropriate; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248623 91177308-0d34-0410-b5e6-96231b3b80d8

merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)

This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248622 91177308-0d34-0410-b5e6-96231b3b80d8

PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint

The algorithm would not modify the live-in list of blocks below the save
block point which is correct unless it happens to be a restore point at
the same time.
Also fixes the benign issue of live-in registers being added twice in
some cases.

The testcase is based on a test submitted by Kit Barton.

Differential Revision: http://reviews.llvm.org/D13176

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248620 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Use .hsatext section instead of .text for HSA

Reviewers: arsenm, grosbach, rafael

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12424

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248619 91177308-0d34-0410-b5e6-96231b3b80d8

MCAsmInfo: Allow targets to specify when the .section directive should be omitted

Summary:
The default behavior is to omit the .section directive for .text, .data,
and sometimes .bss, but some targets may want to omit this directive for
other sections too.

The AMDGPU backend will uses this to emit a simplified syntax for section
switches. For example if the section directive is not omitted (current
behavior), section switches to .hsatext will be printed like this:

.section .hsatext,#alloc,#execinstr,#write

This is actually wrong, because .hsatext has some custom STT_* flags,
which MC doesn't know how to print or parse.

If the section directive is omitted (made possible by this commit),
section switches will be printed like this:

.hsatext

The motivation for this patch is to make it possible to emit sections
with custom STT_* flags without having to teach MC about all the target
specific STT_* flags.

Reviewers: rafael, grosbach

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12423

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248618 91177308-0d34-0410-b5e6-96231b3b80d8

MachineBasicBlock: Factor out common code into isReturnBlock()

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248617 91177308-0d34-0410-b5e6-96231b3b80d8

Revert two SCEV changes that caused test failures in clang.

r248606: "[SCEV] Exploit A (A+K) < (B+K) when possible"
r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts."

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248614 91177308-0d34-0410-b5e6-96231b3b80d8

ADCE: Fix typo in file comment. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248613 91177308-0d34-0410-b5e6-96231b3b80d8

PeepholeOptimizer: Remove redundant copies

If a virtual register is copied and another copy was already
seen, replace with the previous copy. This only handles the
simplest cases for now.

This pattern shows up from various operand restrictions
AMDGPU has which require inserting copies depending
on the register class of the operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248611 91177308-0d34-0410-b5e6-96231b3b80d8

Simplify code. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248610 91177308-0d34-0410-b5e6-96231b3b80d8

more space; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248609 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts.

Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.

Depends on D12948
Depends on D12949

Reviewers: atrick, reames, majnemer, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12950

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248608 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Extract helper function from isImpliedCond; NFC

Summary:
This new helper routine will be used in a subsequent change.

Reviewers: hfinkel

Subscribers: hfinkel, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248607 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Exploit A (A+K) < (B+K) when possible

Summary:

This change teaches SCEV's `isImpliedCond` two new identities:

A u (A + C) u< (B + C)
A s (A + C) s< (B + C)

While these are useful on their own, they're really intended to support
D12950.

Reviewers: atrick, reames, majnemer, nlewycky, hfinkel

Subscribers: aadg, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12948

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248606 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Add some more tests for literal operands

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248600 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Make getNamedOperandIdx declaration readonly

This matches how it is defined in the generated implementation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248598 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add support for generating pre- and post-index load/store pairs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248593 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Disable some passes that are not meaningful

Don't run passes related to stack maps, garbage collection,
exceptions since these aren't useful for GPUs.

There might be a few more to turn off that I'm less sure about
(e.g. ShrinkWrapping) or I'm not sure how to disable
(SafeStack and StackProtector)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248591 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG

This fixes a select error when the i64 source was also
bitcasted to v2i32 in the original source.

Instead of awkwardly trying to select the modified source value and
the store, replace before isel begins.

Uses a worklist to avoid possible problems from mutating the DAG,
although it seems to work OK without it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248589 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix recomputing dominator tree unnecessarily

SIFixSGPRCopies does not modify the CFG, but this was
being recomputed before running SIFoldOperands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248587 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Re-justify workaround and fix worked around problem

When buffer resource descriptors were built, the upper two components
of the descriptor were first composed into a 64-bit register because
legalizeOperands assumed all operands had the same register class.
Fix that problem, but keep the workaround. I'm not sure anything
actually is actually emitting such a REG_SEQUENCE now.

If multiple resource descriptors are set up with different base
pointers, this is copied with a single s_mov_b64. We probably
should fix this better by recognizing a pair of s_mov_b32 later,
but for now delete the dead code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248585 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources

This avoids needting to re-legalize the new REG_SEQUENCE.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248584 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix not adding exec to defs of cmpx instruction pseudos

This was only set on the final _si/_vi version, but not
on the pseudos most of codegen sees.

No test since these instructions aren't used yet.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248583 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Improve accuracy of instruction rates for VOPC

These were all using the default 32-bit VALU write class,
but the i64/f64 compares are half rate.

I'm not sure this is really correct, because they are still using
the write to VALU write class, even though they really write
to the SALU.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248582 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalsAA] Teach GlobalsAA about nocapture

Arguments to function calls marked "nocapture" can be marked as
non-escaping. However, nocapture is defined in terms of the lifetime
of the callee, and if the callee can directly or indirectly recurse to
the caller, the semantics of nocapture are invalid.

Therefore, we eagerly discover which SCC each function belongs to,
and later can check if callee and caller of a callsite belong to
the same SCC, in which case there could be recursion.

This means that we can't be so optimistic in
getModRefInfo(ImmutableCallsite) - previously we assumed all call
arguments never aliased with an escaping global. Now we need to check,
because a global could now be passed as an argument but still not
escape.

This also solves a related conformance problem: MemCpyOptimizer can
turn non-escaping stores of globals into calls to intrinsics like
llvm.memcpy/llvm/memset. This confuses GlobalsAA, which knows the
global can't escape and so returns NoModRef when queried, when
obviously a memcpy/memset call does indeed reference and modify its
arguments.

This fixes PR24800, PR24801, and PR24802.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248576 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: make -Asserts,-Werror=unused-variable build happy

The value was only used in an assertion. Sink the variable usage into the
assertion.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248562 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: address WoA division limitation

We now emit the compiler generated divide by zero check that was needed for the
MSVC routines.  We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB.  For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name.  Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication.  The division
library calls now match MSVC semantically.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248561 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove unused includes

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248553 91177308-0d34-0410-b5e6-96231b3b80d8

[LangRef] Unbreak the docs Sphinx build.

r248551 introduced some breakage due to incorrectly terminated
``literals`` s.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248552 91177308-0d34-0410-b5e6-96231b3b80d8

[Bitcode][Asm] Teach LLVM to read and write operand bundles.

Summary:
This also adds the first set of tests for operand bundles.

The optimizer has not been audited to ensure that it does the right
thing with operand bundles.

Depends on D12456.

Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner

Subscribers: maksfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D12457

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248551 91177308-0d34-0410-b5e6-96231b3b80d8

Restore test coverage for other than ELFOSABI_NONE

Add a FreeBSD test to restore testing of ELF OSABI other than
ELFOSABI_NONE after r248534.

Differential Revision: http://reviews.llvm.org/D13146

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248550 91177308-0d34-0410-b5e6-96231b3b80d8

Fix typo

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248549 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Improve the readability of the ld/st optimization pass. NFC.

In this context, MI is an add/sub instruction not a loads/store.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248540 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE2] Fix zero/any extension shuffles that don't start from the first element

Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248536 91177308-0d34-0410-b5e6-96231b3b80d8

Use ELFOSABI_NONE instead of ELFOSABI_LINUX.

The doesn't seem to be a difference and ELFOSABI_NONE seems to be far more
common:

* Linux doesn't care when loading and puts ELFOSABI_NONE on core dumps.
* Gold and bfd ld produce files with ELFOSABI_NONE.
* Gold and bfd ld seems to ignore EI_OSABI other than for freebsd.
* Gas puts ELFOSABI_NONE in most .o files.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248534 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Add s_dcache_* instructions

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248533 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Add cache invalidation instructions.

These are necessary for implementing mem_fence for
OpenCL 2.0.

The VI assembler tests are disabled since it seems to be
using the wrong encoding or opcode.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248532 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Run mubuf assembler test for CI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248531 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] The paired post-increment store instruction has an output register.

The pre- and post-increment version update the base register, but the post-
version was defined incorrectly. There is no test case as we don't currently
generate these instructions, but I plan on changing that in the near future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248528 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Add operand bundles to CallInst and InvokeInst.

Summary:
This change teaches `CallInst`s and `InvokeInst`s to maintain a set of
operand bundles as part of its operands. `CallInst`s and `InvokeInst`s
with operand bundles co-allocate some space before their `Use` array to
hold meta information about which of its operands are part of an operand
bundle.

The strings corresponding to the bundle tags are interned into
`LLVMContextImpl::BundleTagCache`

This change does not include any parsing / bitcode support. That's the
next change.

Depends on D12455.

Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner

Subscribers: MatzeB, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12456

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248527 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def

Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.

This patch changes the handling of +t2dsp to be in line with other
architecture extensions.

Following a revert of r248152 and new review comments, this patch also includes
renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc.
The spelling of "t2dsp" is preserved, pending a further investigation of its
possible external usage.

Differential Revision: http://reviews.llvm.org/D12937

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248519 91177308-0d34-0410-b5e6-96231b3b80d8

dsymutil: Fix the condition to distinguish module imports form definitions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248512 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTracking] Teach isKnownNonZero a new trick

If the shifter operand is a constant, and all of the bits shifted out
are known to be zero, then if X is known non-zero at least one
non-zero bit must remain.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248508 91177308-0d34-0410-b5e6-96231b3b80d8

[objdump] Make iterator operator* return a reference.

This is closer to the expected behavior of an iterator and avoids awkward
warnings from clang's -Wrange-loop-analysis below.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248497 91177308-0d34-0410-b5e6-96231b3b80d8

Regression Test: Deletes redundant/invalid test.

Removes absdiff_expand.ll regression test file which is invalid.

Diffrential Revision: http://reviews.llvm.org/D11678

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248493 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Use PredicateControl for the MSA ASE instructions. NFC.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13092

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248486 91177308-0d34-0410-b5e6-96231b3b80d8

Codegen: Fix llvm.*absdiff semantic.

Fixes the overflow case of llvm.*absdiff intrinsic also updats the tests and LangRef.rst accordingly.

Differential Revision: http://reviews.llvm.org/D11678

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248483 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Recognize another bswap idiom.

Summary:
The byte-swap recognizer can now notice that this

```
uint32_t bswap(uint32_t x)
{
 x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;
 x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;
 return x;
}
```

is a bswap. Fixes PR23863.

Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin

Subscribers: majnemer, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12637

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248482 91177308-0d34-0410-b5e6-96231b3b80d8

Introduce target hook for optimizing register copies

Allow a target to do something other than search for copies
that will avoid cross register bank copies.

Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.

I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid kinds
of subregister copies on some targets.

I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.

The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248478 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Return after instruction is processed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248476 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove another unnecessary check from commuteInstruction

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248475 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Add readonly to InstrMapping functions

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248474 91177308-0d34-0410-b5e6-96231b3b80d8

TableGen: Add LLVM_READONLY to generated InstrMapping functions

These just read from a generated table.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248473 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix printing trailing whitespace for mubuf atomics

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248472 91177308-0d34-0410-b5e6-96231b3b80d8

Remove dead declaration

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248471 91177308-0d34-0410-b5e6-96231b3b80d8

Use new TokenFactor chain when merging stores

If the stores are storing values from loads which partially
alias the stores, we could end up placing the merged loads
and stores on the same chain which has the potential to break.
Each store may have a different chain dependency on only some
of the original loads. Create a new TokenFactor to capture all
of the required dependencies of the stores rather than assuming
all stores can use the same chain.

The testcase is a situation where this happens, although
it does not have an observable change from this. The DAG nodes
just happened to not be reordered before despite this missing
chain dependency.

This is based on an off-list report for an out of tree target
which regressed due to r246307 and I haven't managed to find a case
where the nodes do end up reordered with an in tree target.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248468 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Reduce number of copies emitted

Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.

This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248467 91177308-0d34-0410-b5e6-96231b3b80d8

Fix a think-o in which functions these should surround

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248465 91177308-0d34-0410-b5e6-96231b3b80d8

Add some NDEBUG checks I accidentally dropped in r248462

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248464 91177308-0d34-0410-b5e6-96231b3b80d8

BasicAA: Move BasicAAResult::alias out-of-line. NFC

This makes the header more readable and cleans up some unnecessary
header differences between NDEBUG and !NDEBUG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248462 91177308-0d34-0410-b5e6-96231b3b80d8

Add CFG Simplification pass after Loop Unswitching.

Loop unswitching produces conditional branches with constant condition,
and it's beneficial for later passes to clean this up with simplify-cfg.
We do this after the second invocation of loop-unswitch, but not after
the first one. Not doing so might cause problem for passes like
LoopUnroll, whose estimate of loop body size would be less accurate.

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D13064

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248460 91177308-0d34-0410-b5e6-96231b3b80d8

[safestack] Fix compiler crash in the presence of stack restores.

A use can be emitted before def in a function with stack restore
points but no static allocas.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248455 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Teach `llvm::User` to co-allocate a descriptor.

Summary:
With this change, subclasses of `llvm::User` will be able to co-allocate
a variable number of bytes (called a "descriptor") with the `llvm::User`
instance. The co-allocated descriptor can later be accessed using
`llvm::User::getDescriptor`. This will be used in later changes to
implement operand bundles.

This change steals one bit from `NumUserOperands`, but given that it is
still 28 bits wide I don't think this will be a practical issue.

This change does not allow allocating hung off uses with descriptors.
This only for simplicity, not for any fundamental reason; and we can
easily add this functionality later if needed.

Reviewers: reames, chandlerc, dexonsmith, kmod, majnemer, pete, JosephTremoulet

Subscribers: pete, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12455

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248453 91177308-0d34-0410-b5e6-96231b3b80d8

Add REQUIRES: default_triple to these testcases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248452 91177308-0d34-0410-b5e6-96231b3b80d8

Remove iterator_range::end.

Because the current proposal does not include that member function,
and we are trying to keep in line with that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248451 91177308-0d34-0410-b5e6-96231b3b80d8

Add iterator_range::end() predicate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248447 91177308-0d34-0410-b5e6-96231b3b80d8

[Unroll] When completely unrolling the loop, replace conditinal branches with unconditional.

Nothing is expected to change, except we do less redundant work in
clean-up.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12951

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248444 91177308-0d34-0410-b5e6-96231b3b80d8

Put profile variables of COMDAT functions to it's own COMDAT group.

In -fprofile-instr-generate compilation, to remove the redundant profile
variables for the COMDAT functions, these variables are placed in the same
COMDAT group as its associated function. This way when the COMDAT function
is not picked by the linker, those profile variables will also not be
output in the final binary. This may cause warning when mix link objects
built w and wo -fprofile-instr-generate.

This patch puts the profile variables for COMDAT functions to its own COMDAT
group to avoid the problem.

Patch by xur.
Differential Revision: http://reviews.llvm.org/D12248

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248440 91177308-0d34-0410-b5e6-96231b3b80d8

set div/rem default values to 'expensive' in TargetTransformInfo's cost model

...because that's what the cost model was intended to do.

As discussed in D12882, this fix has a temporary unintended consequence for
SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make
PR24818 right, and two wrongs make PR24343 act right even though it's really
still wrong.

I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this
cost model change and preserve the righteousness for the bug report cases.

https://llvm.org/bugs/show_bug.cgi?id=24818
https://llvm.org/bugs/show_bug.cgi?id=24343

Differential Revision: http://reviews.llvm.org/D12882

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248439 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: fix folding stack adjustment (again again again...)

This time, the issue is that we weren't accounting for the possibility that
aligned DPRs could have been stored after the final "push" in a prologue. When
that happened we effectively moved a "sub sp, #N" from below the aligned stores
to above them, and everything went to pot.

To make it worse, I'd actually committed something testing that we produced
wrong code, so the test update is tiny.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248437 91177308-0d34-0410-b5e6-96231b3b80d8

dsymutil: Don't prune forward declarations inside a module definition.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248428 91177308-0d34-0410-b5e6-96231b3b80d8