Saturday, April 26, 2008

octave: speed optimization - fix matrix looping (080426)

with or without following commands on following octave codes will effect obviously different execution time. from with/without preallocated memory, changing looping sequence, vectorizing, until using specific function will produced different execution time. i have eight cases to be compared, they are:
  • case 1: without preallocated memory
for i = 1:N
m1(i, 1:P) = i * ones(1, P);
endfor
elapsed time is 39.3089 seconds.
  • case 2: preallocated memory with 'm = [];'
m2 = [];
for i = 1:N
m2(i, 1:P) = i * ones(1, P);
endfor
elapsed time is 38.7618 seconds.
  • case 3: preallocated memory with 'm(N, P) = 1;'
m3(N, P) = 1;
for i = 1:N
m3(i, 1:P) = i * ones(1, P);
endfor
elapsed time is 0.126078 second.
  • case 4: preallocated memory with 'm = ones(N, P);'
m4 = ones(N, P);
for i = 1:N
m4(i, 1:P) = i * ones (1, P);
endfor
elapsed time is 0.082219 second.
  • case 5: changing the looping sequence from 'for i = 1:N' to 'for i = N:-1:1'
thanks to ben abbott for his answer to my question related to this optimization. he also suggest me to compare with following code:
m5 = [];
for i = N:-1:1
m5(i, 1:P) = i * ones (1, P);
endfor
elapsed time is 0.130076 second.


thanks to przemek klosowski for following new cases (case 6 to 8) and idea to add the conclusion.
  • case 6: nested or more looping with preallocated memory
m6 = ones(N, P);
for i = N:-1:1
for j = P:-1:1
m6(i, j) = i;
endfor
endfor
elapsed time is 15.0053 seconds.
  • case 7: vectorizing using looping
m7 = ones(N, P);
for j = 1:P
m7(:,j) = 1:N;
endfor
elapsed time is 0.053149 second.
  • case 8: vectorizing using 'repmat' function
m8 = repmat((1:N)', 1, P);
elapsed time is 0.045707 second.
  • conclusion
  1. from case 1-4: one way to speed up looping execution is with preallocated memory.
  2. from case 5: another way to speed up looping execution is with changing the looping sequence.
  3. from case 6: looping is very time consuming. so, avoid of using looping with other commands. this can be different from case to case.
  4. from case 7: another way to speed up looping execution is with vectorizing. vectorizing can be faster than preallocated memory or changing the looping sequence.
  5. from case 8: another way to speed up looping execution is using specific function. using specific function can be faster than vectorizing, preallocated memory, or changing the looping sequence.
  • overall test code
# Copyright (C) 2008 by lain.ux
# l411v.ux@gmail.com
# www.l411v.com
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; version 2 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

# ------------------------------------------------[ define matrix size ]--
N = 900;
P = 1800;
printf("matrix size: m(%d,%d)\n", N, P);

# -------------------------------------------[ display result function ]--
function display_result(m, N, P)
printf("result:\n", N, P);

# checking first column (expected: 1, 1, 1, ..., 1)
printf("m[1,1] = %d\t", m(1,1));
printf("m[1,2] = %d\t", m(1,2));
printf("m[1,3] = %d\t", m(1,3));
printf("m[1,%d] = %d\n", P, m(1,P));

# checking second column (expected: 2, 2, 2, ..., 2)
printf("m[2,1] = %d\t", m(2,1));
printf("m[2,2] = %d\t", m(2,2));
printf("m[2,3] = %d\t", m(2,3));
printf("m[2,%d] = %d\n", P, m(2,P));

# checking Nth column (expected: 900, 900, 900, ..., 900)
printf("m[%d,1] = %d\t", N, m(N,1));
printf("m[%d,2] = %d\t", N, m(N,2));
printf("m[%d,3] = %d\t", N, m(N,3));
printf("m[%d,%d] = %d\n", N, P, m(N,P));
end

# each column of matrix is the sequence 1, 2, ..., N
# ------------------------------------------------------------[ case 1 ]--
printf("\ncase 1: without preallocated memory\n");
tic
for i = 1:N
m1(i, 1:P) = i * ones(1, P);
endfor
toc
display_result(m1, N, P);

# ------------------------------------------------------------[ case 2 ]--
printf("\ncase 2: preallocated memory with 'm = [];'\n");
tic
m2 = [];
for i = 1:N
m2(i, 1:P) = i * ones(1, P);
endfor
toc
display_result(m2, N, P);

# ------------------------------------------------------------[ case 3 ]--
printf("\ncase 3: preallocated memory with 'm(N, P) = 1;'\n");
tic
m3(N, P) = 1;
for i = 1:N
m3(i, 1:P) = i * ones(1, P);
endfor
toc
display_result(m3, N, P);

# ------------------------------------------------------------[ case 4 ]--
printf("\ncase 4: preallocated memory with 'm = ones(N, P);'\n");
tic
m4 = ones(N, P);
for i = 1:N
m4(i, 1:P) = i * ones (1, P);
endfor
toc
display_result(m4, N, P);

# ------------------------------------------------------------[ case 5 ]--
printf("\ncase 5: changing the looping sequence ");
printf("from 'for i = 1:N' to 'for i = N:-1:1'\n");
tic
m5 = [];
for i = N:-1:1
m5(i, 1:P) = i * ones (1, P);
endfor
toc
display_result(m5, N, P);

# ------------------------------------------------------------[ case 6 ]--
printf("\ncase 6: nested or more looping with preallocated memory\n");
tic
m6 = ones(N, P);
for i = N:-1:1
for j = P:-1:1
m6(i, j) = i;
endfor
endfor
toc
display_result(m6, N, P);

# ------------------------------------------------------------[ case 7 ]--
printf("\ncase 7: vectorizing using looping\n");
tic
m7 = ones(N, P);
for j = 1:P
m7(:,j) = 1:N;
endfor
toc
display_result(m7, N, P);

# ------------------------------------------------------------[ case 8 ]--
printf("\ncase 8: vectorizing using 'repmat' function\n");
tic
m8 = repmat((1:N)', 1, P);
toc
display_result(m8, N, P);
  • execution result
$ octave-3.0.1 -q trick_080426.octave
matrix size: m(900,1800)

case 1: without preallocated memory
Elapsed time is 39.3089 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 2: preallocated memory with 'm = [];'
Elapsed time is 38.7618 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 3: preallocated memory with 'm(N, P) = 1;'
Elapsed time is 0.126078 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 4: preallocated memory with 'm = ones(N, P);'
Elapsed time is 0.082219 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 5: changing the looping sequence from 'for i = 1:N' to 'for i = N:-1:1'
Elapsed time is 0.130076 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 6: nested or more looping with preallocated memory
Elapsed time is 15.0053 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 7: vectorizing using looping
Elapsed time is 0.053149 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 8: vectorizing using 'repmat' function
Elapsed time is 0.045707 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

No comments: