Generally, array address expressions are reassociated so that
small constants are added in last. This is good for expressions
involving the loop induction variable because it's likely that
loop unrolling will createexpressions like:
(array_base + (i<<2)) + 12 # 12 bytes of object header
(array_base + (i<<2)) + 16
(array_base + (i<<2)) + 20
But this may not be the best when the index expression
does not include the induction variable. Loop unrolling
may create:
(array_base + exp0) + 12
(array_base + exp1) + 12
(array_base + exp2) + 12
which could be reassociated as:
derived_base = array_base + 12
derived_base + exp0
derived_base + exp1
derived_base + exp2
Which can use the 2 register form of
a memory instruction to perform the add.
In the following test case:
Should use a derived pointer for R_L0+#12
and then incorporate Add into LDUW addressing (2 register version)
ADD R_L6,R_L0,R_L0
LDUW [R_L0 + #12],R_L0
becomes
LDUW [R_L6, R_L0],R_L0 # where R_L6 is derived: "+ #12"
01 public class CRC32 implements Checksum {
02 private static final int[] crc_table = {...}; /* 256 constants */
03
04 /* Rolled, use "unsigned byte idiom" */
05 private static int updateBytes(int crc, byte[] b,
06 final int off, final int len) {
07 final int[] table = crc_table;
08 final int limit = off + len;
09 crc = crc ^ 0xffffffff;
10 for (int i = off; i < limit; i++) {
11 crc = table[(crc ^ ((int)b[i] & 0xff)) & 0xff]
12 ^ (crc >>> 8);
13 }
14 return crc ^ 0xffffffff;
15 }
16 }
From Mike Paleczny:
Yes, the normal canonicalization which moves or keeps small
constants near the root of each AddP expression tree doesn't
do the best thing for:
(AddP1 (AddP2 L6 SLL_1) 12)
(AddP1 (AddP2 L6 SLL_2) 12)
(AddP1 (AddP2 L6 SLL_3) 12)
...
I think this is an instance of the general "reassociation" problem.
That is, swapping the ordering will result in the nonchanging
derived pointer:
(AddP1 (AddP2 L6 12) SLL_1)
(AddP1 (AddP2 L6 12) SLL_2)
(AddP1 (AddP2 L6 12) SLL_3)
...
Once reassociated, all the (AddP2 L6 12) expressions
will fold together.
###@###.### 2005113 18:07:14 GMT
small constants are added in last. This is good for expressions
involving the loop induction variable because it's likely that
loop unrolling will createexpressions like:
(array_base + (i<<2)) + 12 # 12 bytes of object header
(array_base + (i<<2)) + 16
(array_base + (i<<2)) + 20
But this may not be the best when the index expression
does not include the induction variable. Loop unrolling
may create:
(array_base + exp0) + 12
(array_base + exp1) + 12
(array_base + exp2) + 12
which could be reassociated as:
derived_base = array_base + 12
derived_base + exp0
derived_base + exp1
derived_base + exp2
Which can use the 2 register form of
a memory instruction to perform the add.
In the following test case:
Should use a derived pointer for R_L0+#12
and then incorporate Add into LDUW addressing (2 register version)
ADD R_L6,R_L0,R_L0
LDUW [R_L0 + #12],R_L0
becomes
LDUW [R_L6, R_L0],R_L0 # where R_L6 is derived: "+ #12"
01 public class CRC32 implements Checksum {
02 private static final int[] crc_table = {...}; /* 256 constants */
03
04 /* Rolled, use "unsigned byte idiom" */
05 private static int updateBytes(int crc, byte[] b,
06 final int off, final int len) {
07 final int[] table = crc_table;
08 final int limit = off + len;
09 crc = crc ^ 0xffffffff;
10 for (int i = off; i < limit; i++) {
11 crc = table[(crc ^ ((int)b[i] & 0xff)) & 0xff]
12 ^ (crc >>> 8);
13 }
14 return crc ^ 0xffffffff;
15 }
16 }
From Mike Paleczny:
Yes, the normal canonicalization which moves or keeps small
constants near the root of each AddP expression tree doesn't
do the best thing for:
(AddP1 (AddP2 L6 SLL_1) 12)
(AddP1 (AddP2 L6 SLL_2) 12)
(AddP1 (AddP2 L6 SLL_3) 12)
...
I think this is an instance of the general "reassociation" problem.
That is, swapping the ordering will result in the nonchanging
derived pointer:
(AddP1 (AddP2 L6 12) SLL_1)
(AddP1 (AddP2 L6 12) SLL_2)
(AddP1 (AddP2 L6 12) SLL_3)
...
Once reassociated, all the (AddP2 L6 12) expressions
will fold together.
###@###.### 2005113 18:07:14 GMT
 relates to

JDK8154826 AArch64: take better advantage of base + shifted offset addressing mode
 Resolved