improve code insertion in two ways:
1. Only forward subst offsets into loads and stores, not into arbitrary
things, where it will likely become a load.
2. If the source is a cast from pointer, forward subst the cast as well,
allowing us to fold the cast away (improving cases when the cast is
from an alloca or global).
This hasn't been fully tested, but does appear to further reduce register
pressure and improve code. Lets let the testers grind on it a bit. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@24640
91177308-0d34-0410-b5e6-
96231b3b80d8