iterating over 2d array: by row or by column? | Liberty BASIC Community Forum

tsh73
Global Moderator

Global Moderator

just in case you wondered Anatoly is my real name

Posts: 1,419

iterating over 2d array: by row or by column? Sept 28, 2020 15:41:42 GMT -5

Quote

Post by tsh73 on Sept 28, 2020 15:41:42 GMT -5

I don't remember where I've got this - but it is like common knowledge that then you need to iterate over 2d array, you better change last index in the inner loop.
Like this

for i = 1 to N
 for j = 1 to N
    s=s+a(i,j)
    next
next

Supposed reason is that numbers in array placed in memory tightly at one another, last index change moves to very next memory cell, and somehow processor cashes several numbers at once. Data locality they say. Or so I remember.

So.
Let's try it with LB
(be warned. On my old box it took 2x by 40 seconds)


kk = 20
N=255
N=100

kk = 1
N=2000

dim a(N,N)

t0=time$("ms")

s=0
for k = 1 to kk
	for i = 1 to N
		for j = 1 to N  'inner loop changes last index
			s=s+a(i,j)
		next
	next
next

t1=time$("ms")
print t1-t0

s=0
for k = 1 to kk
	for j = 1 to N  'outer loop changes last index
		for i = 1 to N
			s=s+a(i,j)
		next
	next
next

t2=time$("ms")
print t2-t1

Surprise!
I don't see any significant difference.
And if I swap i and j loops in this program, I still don't see difference (like, if one was faster, after swapping loops it should change)

Now I took that program to QBASIC - I still have one around. It has no arrays more then 64k so I set N to 100 but compensate with KK.
No difference.

BUT
then running similar code in C$ or PascalABC.NET - it drastically changes execution time (times faster).
So
Apparently they somehow utilize that "data locality", which BASIC does not.
Is there something fundamental preventing it to make better use of data locality?
Is it a bad thing that BASIC does not show difference? Is it could be made perform better with regards of this?

EDIT
put that QBASIC program into QB64. No difference
Increased N to 2000 - still no difference

Put it into C
tcc and MS Visual C 2008
- (N=500 or tcc cannot allocate memory by default)
2x faster with inner look by last index.

Put BASIC program into g95 fortran (I do have lot's of old compilers installed!)
messed with it until it worked
N=2000, KK=100
Execution time depends on order - 2x difference (c#/Pascal gave more then 2x)
but it goes faster then inner loop is on FIRST index, not on the LAST one.
Now, this one leaves me completely puzzled...

EDIT

One just need to ask right questions
Array Indexing and Order
docs.oracle.com/cd/E19957-01/805-4940/z400091044d0/

Array Order

Fortran arrays are stored in column-major order: A(3,2)

A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2) A(1,3) A(2,3) A(3,3)
C arrays in row-major order: A[3][2]

A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2] A[2][0] A[2][1] A[2][2]

So it explains it. Indeed data locality works for Fortan too.

Last Edit: Sept 28, 2020 15:46:24 GMT -5 by tsh73

Carl Gundel
Administrator

Posts: 1,535

iterating over 2d array: by row or by column? Sept 28, 2020 17:50:20 GMT -5 via mobile

Quote

Post by Carl Gundel on Sept 28, 2020 17:50:20 GMT -5

Sept 28, 2020 15:41:42 GMT -5 tsh73 said:

I don't remember where I've got this - but it is like common knowledge that then you need to iterate over 2d array, you better change last index in the inner loop.
Like this

for i = 1 to N
 for j = 1 to N
    s=s+a(i,j)
    next
next

Supposed reason is that numbers in array placed in memory tightly at one another, last index change moves to very next memory cell, and somehow processor cashes several numbers at once. Data locality they say. Or so I remember.

So.
Let's try it with LB
(be warned. On my old box it took 2x by 40 seconds)


kk = 20
N=255
N=100

kk = 1
N=2000

dim a(N,N)

t0=time$("ms")

s=0
for k = 1 to kk
	for i = 1 to N
		for j = 1 to N  'inner loop changes last index
			s=s+a(i,j)
		next
	next
next

t1=time$("ms")
print t1-t0

s=0
for k = 1 to kk
	for j = 1 to N  'outer loop changes last index
		for i = 1 to N
			s=s+a(i,j)
		next
	next
next

t2=time$("ms")
print t2-t1

Surprise!
I don't see any significant difference.
And if I swap i and j loops in this program, I still don't see difference (like, if one was faster, after swapping loops it should change)

Now I took that program to QBASIC - I still have one around. It has no arrays more then 64k so I set N to 100 but compensate with KK.
No difference.

BUT
then running similar code in C$ or PascalABC.NET - it drastically changes execution time (times faster).
So
Apparently they somehow utilize that "data locality", which BASIC does not.
Is there something fundamental preventing it to make better use of data locality?
Is it a bad thing that BASIC does not show difference? Is it could be made perform better with regards of this?

EDIT
put that QBASIC program into QB64. No difference
Increased N to 2000 - still no difference

Put it into C
tcc and MS Visual C 2008
- (N=500 or tcc cannot allocate memory by default)
2x faster with inner look by last index.

Put BASIC program into g95 fortran (I do have lot's of old compilers installed!)
messed with it until it worked
N=2000, KK=100
Execution time depends on order - 2x difference (c#/Pascal gave more then 2x)
but it goes faster then inner loop is on FIRST index, not on the LAST one.
Now, this one leaves me completely puzzled...

EDIT

One just need to ask right questions
Array Indexing and Order
docs.oracle.com/cd/E19957-01/805-4940/z400091044d0/

Array Order

Fortran arrays are stored in column-major order: A(3,2)

A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2) A(1,3) A(2,3) A(3,3)
C arrays in row-major order: A[3][2]

A[0][0] A[0][1] A[0][2] A[1][0] A[1][1] A[1][2] A[2][0] A[2][1] A[2][2]

So it explains it. Indeed data locality works for Fortan too.

There is no more locality in the inner loop. All the variables are declared at the same level. There is no way to optimize that.

-Carl Gundel, author of Liberty BASIC
www.libertybasic.com

Chris Iverson
Administrator

Posts: 940

iterating over 2d array: by row or by column? Sept 28, 2020 19:14:06 GMT -5

Quote

Post by Chris Iverson on Sept 28, 2020 19:14:06 GMT -5

There is a lot more going on here than you think.

First, most BASICs today are interpreted, or JIT-compiled, meaning there is a lot of code in between "your" code and the running, bare-metal system. This goes for the organization of data in memory, as well. Differences in both of these can easily account for the differences you're seeing here.

C/C++ code is compiled directly into the target processor, and is often optimized for such by the compiler, much better than a human can do. This lets the code take advantage of such things as data/cache locality.

Additionally, today's modern CPUs have a bunch of crazy wizardry going on that make even basic assumptions on code vs performance very hard to gauge. See the following two articles for instances where the CPU performs differently from what you'd expect(and can cause you to write slower code if you optimize the wrong way):

Optimization is often counter-intuitive
Non-classical processor behavior: How doing something can be faster than not doing it

Carl Gundel
Administrator

Posts: 1,535

iterating over 2d array: by row or by column? Sept 28, 2020 19:51:15 GMT -5 via mobile

Quote

Post by Carl Gundel on Sept 28, 2020 19:51:15 GMT -5

Sept 28, 2020 19:14:06 GMT -5 Chris Iverson said:

There is a lot more going on here than you think.

First, most BASICs today are interpreted, or JIT-compiled, meaning there is a lot of code in between "your" code and the running, bare-metal system.

That’s right. Liberty BASIC is compiled to bytecodes which are translated dynamically into machine code on demand and cached. There are no ‘registers’ to speak of in the execution model of the virtual machine that Smalltalk runs on. Access to any object is essentially the same regardless of scope. However some kinds of access can be different if an object is new or old or even older. Generational scavenging garbage collection is probably the biggest factor in performance.

-Carl Gundel, author of Liberty BASIC
www.libertybasic.com