Post by tsh73 on Sept 28, 2020 15:41:42 GMT -5
I don't remember where I've got this - but it is like common knowledge that then you need to iterate over 2d array, you better change last index in the inner loop.
Like this
Supposed reason is that numbers in array placed in memory tightly at one another, last index change moves to very next memory cell, and somehow processor cashes several numbers at once. Data locality they say. Or so I remember.
So.
Let's try it with LB
(be warned. On my old box it took 2x by 40 seconds)
Surprise!
I don't see any significant difference.
And if I swap i and j loops in this program, I still don't see difference (like, if one was faster, after swapping loops it should change)
Now I took that program to QBASIC - I still have one around. It has no arrays more then 64k so I set N to 100 but compensate with KK.
No difference.
BUT
then running similar code in C$ or PascalABC.NET - it drastically changes execution time (times faster).
So
Apparently they somehow utilize that "data locality", which BASIC does not.
Is there something fundamental preventing it to make better use of data locality?
Is it a bad thing that BASIC does not show difference? Is it could be made perform better with regards of this?
EDIT
put that QBASIC program into QB64. No difference
Increased N to 2000 - still no difference
Put it into C
tcc and MS Visual C 2008
- (N=500 or tcc cannot allocate memory by default)
2x faster with inner look by last index.
Put BASIC program into g95 fortran (I do have lot's of old compilers installed!)
messed with it until it worked
N=2000, KK=100
Execution time depends on order - 2x difference (c#/Pascal gave more then 2x)
but it goes faster then inner loop is on FIRST index, not on the LAST one.
Now, this one leaves me completely puzzled...
EDIT
One just need to ask right questions
Array Indexing and Order
docs.oracle.com/cd/E19957-01/805-4940/z400091044d0/
So it explains it. Indeed data locality works for Fortan too.
Like this
for i = 1 to N
for j = 1 to N
s=s+a(i,j)
next
next
Supposed reason is that numbers in array placed in memory tightly at one another, last index change moves to very next memory cell, and somehow processor cashes several numbers at once. Data locality they say. Or so I remember.
So.
Let's try it with LB
(be warned. On my old box it took 2x by 40 seconds)
kk = 20
N=255
N=100
kk = 1
N=2000
dim a(N,N)
t0=time$("ms")
s=0
for k = 1 to kk
for i = 1 to N
for j = 1 to N 'inner loop changes last index
s=s+a(i,j)
next
next
next
t1=time$("ms")
print t1-t0
s=0
for k = 1 to kk
for j = 1 to N 'outer loop changes last index
for i = 1 to N
s=s+a(i,j)
next
next
next
t2=time$("ms")
print t2-t1
Surprise!
I don't see any significant difference.
And if I swap i and j loops in this program, I still don't see difference (like, if one was faster, after swapping loops it should change)
Now I took that program to QBASIC - I still have one around. It has no arrays more then 64k so I set N to 100 but compensate with KK.
No difference.
BUT
then running similar code in C$ or PascalABC.NET - it drastically changes execution time (times faster).
So
Apparently they somehow utilize that "data locality", which BASIC does not.
Is there something fundamental preventing it to make better use of data locality?
Is it a bad thing that BASIC does not show difference? Is it could be made perform better with regards of this?
EDIT
put that QBASIC program into QB64. No difference
Increased N to 2000 - still no difference
Put it into C
tcc and MS Visual C 2008
- (N=500 or tcc cannot allocate memory by default)
2x faster with inner look by last index.
Put BASIC program into g95 fortran (I do have lot's of old compilers installed!)
messed with it until it worked
N=2000, KK=100
Execution time depends on order - 2x difference (c#/Pascal gave more then 2x)
but it goes faster then inner loop is on FIRST index, not on the LAST one.
Now, this one leaves me completely puzzled...
EDIT
One just need to ask right questions
Array Indexing and Order
docs.oracle.com/cd/E19957-01/805-4940/z400091044d0/
Array Order
Fortran arrays are stored in column-major order: A(3,2)
A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2) A(1,3) A(2,3) A(3,3)
C arrays in row-major order: A[3][2]
A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2] A[2][0] A[2][1] A[2][2]
Fortran arrays are stored in column-major order: A(3,2)
A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2) A(1,3) A(2,3) A(3,3)
C arrays in row-major order: A[3][2]
A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2] A[2][0] A[2][1] A[2][2]
So it explains it. Indeed data locality works for Fortan too.