Programming Languages Research Group: Git

X86: Don't drop half of the mask when converting 2-address shufps into 3-address pshufd.

It's debatable whether this transform is useful at all, but for now make sure
we don't generate invalid asm.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219084 91177308-0d34-0410-b5e6-96231b3b80d8

AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q.
This instruction allows to broadacst mask vector to data vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219083 91177308-0d34-0410-b5e6-96231b3b80d8

Simplify code. No functionality change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219082 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Fix PR21139, one of the last remaining regressions found in the
new vector shuffle lowering.

This is loosely based on a patch by Marius Wachtler to the PR (thanks!).
I refactored it a bi to use std::count_if and a mutable array ref but
the core idea was exactly right. I also added some direct testing of
this case.

I believe PR21137 is now the only remaining regression.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219081 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Teach the new vector shuffle lowering how to lower 128-bit
shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the
few remaining regressions impacting benchmarks from the new vector
shuffle lowering.

You may note that it "regresses" many of the vperm2x128 test cases --
these were actually "improved" by the naive lowering that the new
shuffle lowering previously did. This regression gave me fits. I had
this patch ready-to-go about an hour after flipping the switch but
wasn't sure how to have the best of both worlds here and thought the
correct solution might be a completely different approach to lowering
these vector shuffles.

I'm now convinced this is the correct lowering and the missed
optimizations shown in vperm2x128 are actually due to missing
target-independent DAG combines. I've even written most of the needed
DAG combine and will submit it shortly, but this part is ready and
should help some real-world benchmarks out.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219079 91177308-0d34-0410-b5e6-96231b3b80d8

HexagonMCCodeEmitter.cpp: Prune 2nd redundant \brief. [-Wdocumentation]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219073 91177308-0d34-0410-b5e6-96231b3b80d8

[CMake] HexagonTests: Update LINK_COMPONENTS.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219072 91177308-0d34-0410-b5e6-96231b3b80d8

HexagonDesc: Update LLVMBuild.txt.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219071 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Simplify the logic from r219067 using ValueTracking

Joerg suggested on IRC that I look at generalizing the logic from r219067 to
handle more general redundancies (like removing an assume(x > 3) dominated by
an assume(x > 5)). The way to do this would be to ask ValueTracking to
determine the value of the i1 argument. It turns out that ValueTracking is not
very good at this right now (although it does get the trivial redundancy case)
because it does not understand ICmps. Nevertheless, the resulting code in
InstCombine is simpler than r219067, so we might as well do it now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219070 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ] Make operator bool explicit. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219069 91177308-0d34-0410-b5e6-96231b3b80d8

Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219068 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Remove redundant @llvm.assume intrinsics

For any @llvm.assume intrinsic, if there is another which dominates it and uses
the same condition, then it is redundant and can be removed. While this does
not alter the semantics of the @llvm.assume intrinsics, it makes subsequent
handling more efficient (and the resulting IR easier to read).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219067 91177308-0d34-0410-b5e6-96231b3b80d8

Solve Visual C++ warning C4805 on getAsInteger<bool>.

Fix http://llvm.org/PR21158 by adding a cast to unsigned long long,
so the comparison would be between two unsigned long longs instead
of bool and unsigned long long.

if (getAsUnsignedInteger(*this, Radix, ULLVal) ||
static_cast<unsigned long long>(static_cast<T>(ULLVal)) != ULLVal)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219065 91177308-0d34-0410-b5e6-96231b3b80d8

Remove unnecessary copying or replace it with moves in a bunch of places.

NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219061 91177308-0d34-0410-b5e6-96231b3b80d8

Sink DwarfDebug::updateSubprogramScopeDIE into DwarfCompileUnit

This requires exposing some of the current function state from
DwarfDebug. I hope there's not too much of that to expose as I go
through all the functions, but it still seems nicer to expose singular
data down to multiple consumers, than have consumers expose raw mapping
data structures up to DwarfDebug for building subprograms.

Part of a series of refactoring to allow subprograms in both the
skeleton and dwo CUs under Fission.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219060 91177308-0d34-0410-b5e6-96231b3b80d8

Reformatting accidentally left out of r219057

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219059 91177308-0d34-0410-b5e6-96231b3b80d8

Sink DwarfDebug::attachLowHighPC into DwarfCompileUnit

One of many things to sink down into DwarfCompileUnit to allow handling
of subprograms in both the skeleton and dwo CU under Fission.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219058 91177308-0d34-0410-b5e6-96231b3b80d8

Move DwarfCompileUnit from DwarfUnit.h to its own header (DwarfCompileUnit.h)

In preparation for sinking all the subprogram emission code down from
DwarfDebug into DwarfCompileUnit, this will avoid bloating
DwarfUnit.h/cpp greatly and make concerns a bit more clear/isolated.

(sinking this handling down is part of the work to handle emitting
minimal subprograms for -gmlt-like data into the skeleton CU under
fission)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219057 91177308-0d34-0410-b5e6-96231b3b80d8

DI: Fixup global syntax in example

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219056 91177308-0d34-0410-b5e6-96231b3b80d8

DI: Line up comments in examples

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219055 91177308-0d34-0410-b5e6-96231b3b80d8

DI: Fixup example IR from r219051

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219054 91177308-0d34-0410-b5e6-96231b3b80d8

DI: Prune another example

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219053 91177308-0d34-0410-b5e6-96231b3b80d8

DI: Update and prune metadata examples

Update a couple of the examples of debug info metadata, and prune the
rest. Point to the true reference implementation in the source.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219051 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Slap a triple on this test since it is poking around at the stack
and calling conventions. Otherwise its too hard to craft a usefully
generic set of assertions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219047 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Enable the new vector shuffle lowering by default.

Update the entire regression test suite for the new shuffles. Remove
most of the old testing which was devoted to the old shuffle lowering
path and is no longer relevant really. Also remove a few other random
tests that only really exercised shuffles and only incidently or without
any interesting aspects to them.

Benchmarking that I have done shows a few small regressions with this on
LNT, zero measurable regressions on real, large applications, and for
several benchmarks where the loop vectorizer fires in the hot path it
shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy
Bridge machines. Running on AMD machines shows even more dramatic
improvements.

When using newer ISA vector extensions the gains are much more modest,
but the code is still better on the whole. There are a few regressions
being tracked (PR21137, PR21138, PR21139) but by and large this is
expected to be a win for x86 generated code performance.

It is also more correct than the code it replaces. I have fuzz tested
this extensively with ISA extensions up through AVX2 and found no
crashes or miscompiles (yet...). The old lowering had a few miscompiles
and crashers after a somewhat smaller amount of fuzz testing.

There is one significant area where the new code path lags behind and
that is in AVX-512 support. However, there was *extremely little*
support for that already and so this isn't a significant step backwards
and the new framework will probably make it easier to implement lowering
that uses the full power of AVX-512's table-based shuffle+blend (IMO).

Many thanks to Quentin, Andrea, Robert, and others for benchmarking
assistance. Thanks to Adam and others for help with AVX-512. Thanks to
Hal, Eric, and *many* others for answering my incessant questions about
how the backend actually works. =]

I will leave the old code path in the tree until the 3 PRs above are at
least resolved to folks' satisfaction. Then I will rip it (and 1000s of
lines of code) out. =] I don't expect this flag to stay around for very
long. It may not survive next week.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219046 91177308-0d34-0410-b5e6-96231b3b80d8

Add fake use to suppress defined-but-unused warnings

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219045 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Fix a bug in the VZEXT DAG combine that I just made more powerful.

It turns out this combine was always somewhat flawed -- there are cases
where nested VZEXT nodes *can't* be combined: if their types have
a mismatch that can be observed in the result. While none of these show
up in currently, once I switch to the new vector shuffle lowering a few
test cases actually form such nested VZEXT nodes. I've not come up with
any IR pattern that I can sensible write to exercise this, but it will
be covered by tests once I flip the switch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219044 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXT
nodes to the DAG combining of them.

This will allow the combine to fire on both old vector shuffle lowering
and the new vector shuffle lowering and generally seems like a cleaner
design. I've trimmed down the code a bit and tried to make it and the
surrounding combine fairly clean while moving it around.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219042 91177308-0d34-0410-b5e6-96231b3b80d8

R600/SI: Custom lower f64 -> i64 conversions

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219038 91177308-0d34-0410-b5e6-96231b3b80d8

R600: Custom lower [s|u]int_to_fp for i64 -> f64

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219037 91177308-0d34-0410-b5e6-96231b3b80d8

R600/SI: Fix ftrunc f64 conformance failures.

Re-add the tests since they were deleted at some point

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219036 91177308-0d34-0410-b5e6-96231b3b80d8

Remove unused ALL_BINDINGS configuration variable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219035 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Add a really preposterous number of patterns for matching all of
the various ways in which blends can be used to do vector element
insertion for lowering with the scalar math instruction forms that
effectively re-blend with the high elements after performing the
operation.

This then allows me to bail on the element insertion lowering path when
we have SSE4.1 and are going to be doing a normal blend, which in turn
restores the last of the blends lost from the new vector shuffle
lowering when I got it to prioritize insertion in other cases (for
example when we don't *have* a blend instruction).

Without the patterns, using blends here would have regressed
sse-scalar-fp-arith.ll *completely* with the new vector shuffle
lowering. For completeness, I've added RUN-lines with the new lowering
here. This is somewhat superfluous as I'm about to flip the default, but
hey, it shows that this actually significantly changed behavior.

The patterns I've added are just ridiculously repetative. Suggestions on
making them better very much welcome. In particular, handling the
commuted form of the v2f64 patterns is somewhat obnoxious.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219033 91177308-0d34-0410-b5e6-96231b3b80d8

Converting the ErrorHandlerMutex to a ManagedStatic to avoid the static constructor and destructor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219028 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Adjust the patterns for lowering X86vzmovl nodes which don't
perform a load to use blendps rather than movss when it is available.

For non-loads, blendps is *much* faster. It can execute on two ports in
Sandy Bridge and Ivy Bridge, and *three* ports on Haswell. This fixes
one of the "regressions" from aggressively taking the "insertion" path
in the new vector shuffle lowering.

This does highlight one problem with blendps -- it isn't commuted as
heavily as it should be. That's future work though.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219022 91177308-0d34-0410-b5e6-96231b3b80d8

Fix typo in TableGen documentation

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219018 91177308-0d34-0410-b5e6-96231b3b80d8

Add unit tests to verify Hexagon emission.

Add the test cases I overlooked, part of the original commit,
http://reviews.llvm.org/D5523

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219016 91177308-0d34-0410-b5e6-96231b3b80d8

Add a reference to Phabricator.rst to docs/index.rst.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219015 91177308-0d34-0410-b5e6-96231b3b80d8

PR21145: Teach LLVM about C++14 sized deallocation functions.

C++14 adds new builtin signatures for 'operator delete'. This change allows
new/delete pairs to be removed in C++14 onwards, as they were in C++11 and
before.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219014 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Revert "DI: Fold constant arguments into a single MDString""

This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash.  The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).

Original commit message follows.

--

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219010 91177308-0d34-0410-b5e6-96231b3b80d8

[ISel] Keep matching state consistent when folding during X86 address match

In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends.  During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.

However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node.  In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE.  Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.

Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat.  Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node).  We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed.  On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.

I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask.  This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user.  Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).

So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE.  This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly.  (Previous patches used HandleSDNode to detect the
CSE but that's not practical here).  The listener is only installed on X86.

I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc.  The only thing we pay for is the
creation of the listener.  The callback never ever triggers in spec2k since
this is a corner case.

Fixes rdar://problem/18206171

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219009 91177308-0d34-0410-b5e6-96231b3b80d8

R600: Align functions to 256 bytes

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219002 91177308-0d34-0410-b5e6-96231b3b80d8

Eliminate some deep std::vector copies. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218999 91177308-0d34-0410-b5e6-96231b3b80d8

MCParser: Modernize memory handling.

NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218998 91177308-0d34-0410-b5e6-96231b3b80d8

[Power] Delete redundant test Atomics-32.ll

The test Atomics-32.ll was both redundant (all operations are also checked by
atomics.ll at least) and not actually checking correctness (it was not using
FileCheck, just verifying that the compiler does not crash).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218997 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-readobj: print out the fields of the COFF delay-import table

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218996 91177308-0d34-0410-b5e6-96231b3b80d8

[Power] Use lwsync for non-seq_cst fences

Summary:
hwsync is only required for seq_cst fences, acquire and release one can use
the cheaper lwsync.

Test Plan: Added some cases to atomics.ll + make check-all

Reviewers: jfb, wschmidt

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5317

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218995 91177308-0d34-0410-b5e6-96231b3b80d8

MipsAsmParser.cpp: fix VS2012 build

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218991 91177308-0d34-0410-b5e6-96231b3b80d8

HexagonMCCodeEmitter.h: deleted member functions are not supported in VS2012

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218990 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Print warning when using register names not available in N32/64

Summary:
The register names t4-t7 are not available in the N32 and N64 ABIs.
This patch prints a warning, when those names are used in N32/64,
along with a fix-it with the correct register names.

Patch by Vasileios Kalintiris

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5272

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218989 91177308-0d34-0410-b5e6-96231b3b80d8

Fix build break on Hexagon

Differential Revision: http://reviews.llvm.org/D5600

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218987 91177308-0d34-0410-b5e6-96231b3b80d8

Adding skeleton for unit testing Hexagon Code Emission

Adding and modifying CMakeLists.txt files to run unit tests under
unittests/Target/* if the directory exists. Adding basic unit test to check
that code emitter object can be retrieved.

Differential Revision: http://reviews.llvm.org/D5523
Change by: Colin LeMahieu

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218986 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Teach the new vector shuffle lowering to aggressively form MOVSS
and MOVSD nodes for single element vector inserts.

This is particularly important because a number of patterns in the
backend detect these patterns and leverage them to simplify things. It
also fixes quite a few of the insertion bad code examples. However, it
regresses a specific area: when available, blendps and blendpd are
*dramatically* faster than movss and movsd respectively. But it doesn't
really work to form the blend logic first because the blends *aren't* as
crazy efficient when the data is coming from memory anyways, and thus
will have a movss or movsd regardless. Also, doing that would block
a bunch of the patterns that this is designed to hit.

So my plan is to go into the patterns for lowering MOVSS and MOVSD and
lower them via blends when available. However that's a pretty invasive
restructuring so it will need to be a follow-up patch.

I have already gone into the patterns to lower MOVSS and MOVSD from
memory using MOVLPD, etc. Without that, several of the test cases
I already have regress.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218985 91177308-0d34-0410-b5e6-96231b3b80d8

[sphinx cleanup] Fix unexpected indentation warning introduced by r218937

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218982 91177308-0d34-0410-b5e6-96231b3b80d8

Revert 202433 - Provide a target override for the latest regalloc heuristic

That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.

See PR18996.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218981 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Refactor the element insertion logic in the new vector shuffle
lowering to handle the potential mirroring of 2-element vectors (because
we can't reliably sort them one way) in the caller rather than in the
insertion logic.

This will simplify things considerably as more ways to fail to match the
insertion are added because now we have a nice try and retry point.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218980 91177308-0d34-0410-b5e6-96231b3b80d8

Fix typo in comment

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218979 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Fix the RUN-lines of this test to make sense.

I got them quite wrong when updating it and had the SSE4.1 run checked
for SSE2 and the SSE2 run checked for SSE4.1. I think everything was
actually generic SSE, but this still seems good to fix. While here,
hoist the triple into the IR and make the flag set a bit more direct in
what it is trying to test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218978 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Significantly improve the ability of the new vector shuffle
lowering to match VZEXT_MOVL patterns.

I hadn't realized that these had sufficient pattern smarts in the
backend to lower zext-ing from the low element of a vector without it
being a scalar_to_vector node. They do, and this is how to match a bunch
of patterns for movq, movss, etc.

There is a weird propensity to end up using pshufd to place the element
afterward even though it means domain crossing (or rather, to use
xorps+movss to zext the element rather than movq) but that's an
orthogonal problem with VZEXT_MOVL that someone should probably look at.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218977 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Add some important, missing test coverage for blending from one
vector to a zero vector for the v2 cases and fix the v4 integer cases to
actually blend from a vector.

There are already seprate tests for the case of inserting from a scalar.

These cases cover a lot of the regressions I've seen in the regression
test suite for the new vector shuffle lowering and specifically cover
the reported lack of using various zext-ing instruction patterns. My
next patch should fix a big chunk of this, but wanted to get a nice
baseline for these patterns in the test cases first.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218976 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen
element types to form illegal vector types.

I've added a special SSE1 test case here that makes sure we don't break
this going forward.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218974 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Add two more triples to stabilize the precise assembly syntax
across platforms.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218973 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Remove a couple of fairly pointless tests. These were merely
testing that we generated divps and divss but not in a very systematic
way. There are other tests for widening binary operations already that
make these unnecessary.

The second one seems mostly about testing Atom as well as normal X86,
but despite the comment claiming it is testing a different instruction
sequence, it then tests for exactly the same div instruction sequence!
(The sequence of instructions is actually quite different on Atom, but
not the sequence of div instructions....)

And then it has an "execution" test that simply isn't run? Very strange.
Anyways, none of this is really needed so clean this up.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218972 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r215343.

This was contentious and needs invesigation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218971 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Remove XFAIL from two XPASS'ing tests on the llvm-mips-linux builder

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218967 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Add another triple to a test to make the comment syntax stable.
Should fix darwin builders.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218956 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Add triples to these tests so that we see fewer calling convention
differences and they're a bit easier to maintain. This should fix the
tests on cygwin bots, etc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218955 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Regenerate precise FileCheck lines for the lats batch of test
cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218954 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Remove another low-value test still written using grep. We have
many tests for movss and friends.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218953 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Regenerate precise checks for a couple of test cases and remove
a test case that was just grepping the debug stats output rather than
actually checking the generated code for anything useful.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218951 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Remove an over-reduced test case. This would need to be
intergrated much more fully into some logical part of the backend to
really understand what it is trying to accomplish and how to update it.
I suspect it no longer holds enough value to be worth having.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218950 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Regenerate and clean up more tests is preparation for vector
shufle switch.

I nuked a win64 config from one test as it doesn't really make sense to
cover that ABI specially for generic v2f32 tests...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218948 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Cleanup and generate precise FileCheck assertions for a bunch of
SSE tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218947 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] This is a terrible SSE1 test, but we should keep it. I've deleted
two functions that really didn't have any interesting assertions, and
generated more precise tests for one of the others.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218946 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Merge two very similar tests and regenerate FileCheck lines for
them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218945 91177308-0d34-0410-b5e6-96231b3b80d8

[BasicAA] Revert r218714 - Make better use of zext and sign information.

This patch broke 447.dealII on Darwin. I'm currently working on a reduced
test-case, but reverting for now to keep the bots happy.

<rdar://problem/18530107>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218944 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Regenerate a number of FileCheck assertions with my script for
test cases that will change with the new vector shuffle lowering. This
gives us a nice baseline for deltas against. I've checked and removed
the cases where there were weird register usage being pinned down, and
all of these are extremely pin-pointed tests so fully checking them
seems very appropriate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218941 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Remove a couple of other overly isolated tests that are low-value
at this point. We have lots of tests of peephole optimizations with
insert and extract on vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218940 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Remove a test that provides little value. There are plenty of
tests for zext of a vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218939 91177308-0d34-0410-b5e6-96231b3b80d8

Update Atomics.rst

Summary:
I changed various bits of the compilation of atomics recently, and forgot
updating the documentation. This patch just brings it up to date.

Test Plan: no change to the code

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5590

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218937 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Regenerate a bunch more avx512 test cases using my script to have
tighter, more strict FileCheck assertions. Some of these I really like
as they show case exactly what instruction sequences come out of these
microscopic functionality tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218936 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Regenerate an avx512 test with my script to provide a nice
baseline for updates from the new vector shuffle lowering.

I've inspected the results here, and I couldn't find any register
allocation decisions where there should be any realistic way to register
allocate things differently. The closest was the imul test case. If you
see something here you'd like register number variables on, just shout
and I'll add them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218935 91177308-0d34-0410-b5e6-96231b3b80d8

constify TargetMachine parameter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218934 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-readobj: print COFF delay-load import table

This patch adds another iterator to access the delay-load import table
and use it from llvm-readobj.

http://reviews.llvm.org/D5594

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218933 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Remove some of the --show-mc-encoding flags from avx512 tests that
need to be updated for the new vector shuffle lowering.

After talking to Adam Nemet, Tim Northover, etc., it seems that testing
MC encodings in the same suite as the basic codegen isn't the right
approach. Instead, we're going to want dedicated MC tests for the
encodings. These encodings are starting to get in my way so I wanted to
cut them out early. The total set of instructions that should have
encoding tests added is:

  vpaddd
  vsqrtss
  vsqrtsd
  vmovlhps
  vmovhlps
  valignq
  vbroadcastss

Not too many parts of these tests were even using this. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218932 91177308-0d34-0410-b5e6-96231b3b80d8

constify TargetMachine argument.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218930 91177308-0d34-0410-b5e6-96231b3b80d8

We can grab the options struct from the TargetMachine, no need to
pass it down in the constructor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218929 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX512] Pull pattern for subvector insert into the instruction definition

No functional change intended.

Very similar to the change I made for subvector extract in r218480.

test/CodeGen/X86/avx512-insert-extract.ll covers this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218928 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX512] Refactor subvector inserts

No functional change.

Very similar to the extract refactoring I did in r218478.

Compared X86.td.expanded before and after.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218927 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm

Just like in the case of extracts, the refactoring is uncovering some typos in
the code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218926 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-readobj: add a test for COFF import-by-ordinal symbols

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218924 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Modern Book-E cores support sync

Older Book-E cores, such as the PPC 440, support only msync (which has the same
encoding as sync 0), but not any of the other sync forms. Newer Book-E cores,
however, do support sync, and for performance reasons we should allow the use
of the more-general form.

This refactors msync use into its own feature group so that it applies by
default only to older Book-E cores (of the relevant cores, we only have
definitions for the PPC440/450 currently).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218923 91177308-0d34-0410-b5e6-96231b3b80d8

[Power] Improve the expansion of atomic loads/stores

Summary:
Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.

For example, this patch turns
```
define void @store_i8_unordered(i8* %mem) {
  store atomic i8 42, i8* %mem unordered, align 1
  ret void
}
```
from
```
_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    rlwinm r2, r3, 3, 27, 28
    li r4, 42
    xori r5, r2, 24
    rlwinm r2, r3, 0, 0, 29
    li r3, 255
    slw r4, r4, r5
    slw r3, r3, r5
    and r4, r4, r3
LBB4_1:                                 ; =>This Inner Loop Header: Depth=1
    lwarx r5, 0, r2
    andc r5, r5, r3
    or r5, r4, r5
    stwcx. r5, 0, r2
    bne cr0, LBB4_1
; BB#2:
    blr
```
into
```
_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    li r2, 42
    stb r2, 0(r3)
    blr

```
which looks like a pretty clear win to me.

Test Plan:
fixed the tests + new test for indexed accesses + make check-all

Reviewers: jfb, wschmidt, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5587

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218922 91177308-0d34-0410-b5e6-96231b3b80d8

Fix the threshold added in r186434 (a re-apply of r185393) and updaated
to be a ManagedStatic in r218163 to not be a global variable written and
read to from within the innards of SpillPlacement.

This will fix a really scary race condition for anyone that has two
copies of LLVM running spill placement concurrently. Yikes!

This will also fix a really significant compile time hit that r218163
caused because the spill placement threshold read is actually in the
*very* hot path of this code. The memory fence on each read was showing
up as huge compile time regressions when spilling is responsible for
most of the compile time. For example, optimizing sanitized code showed
over 50% compile time regressions here. =/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218921 91177308-0d34-0410-b5e6-96231b3b80d8

[Stackmaps] Make ithe frame-pointer required for stackmaps.

Do not eliminate the frame pointer if there is a stackmap or patchpoint in the
function. All stackmap references should be FP relative.

This fixes PR21107.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218920 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "DI: Fold constant arguments into a single MDString"

This reverts commit r218914 while I investigate some bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218918 91177308-0d34-0410-b5e6-96231b3b80d8

Rename data -> Data

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218916 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-readobj: print COFF imported symbols

This patch defines a new iterator for the imported symbols.
Make a change to COFFDumper to use that iterator to print
out imported symbols and its ordinals.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218915 91177308-0d34-0410-b5e6-96231b3b80d8

DI: Fold constant arguments into a single MDString

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218914 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Teach the new vector shuffle lowering to widen floating point
elements as well as integer elements in order to form simpler shuffle
patterns.

This is the primary reason why we were failing to match some of the
2-and-2 floating point shuffles such as PR21140. Even after fixing this
we need to support some extra patterns in the backend in order to match
the resulting X86ISD::UNPCKL nodes into the correct instructions. This
commit should fix PR21140 and includes more comprehensive testing of
insertion patterns in v4 shuffles.

Not all of the added tests are beautiful. For example, we don't have
clever instructions to insert-via-load in the integer domain. There are
also some places where we aren't sufficiently cunning with our use of
movq and movd, but that's future work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218911 91177308-0d34-0410-b5e6-96231b3b80d8