Showing posts with label code. Show all posts
Showing posts with label code. Show all posts

Thursday, May 22, 2008

algorithm

some quotes from algorithm class at mit.
you wanna be a good programmer?
you just program every day for two years, you will be an excellent programmer.

you wanna be a world class programmer?
you can program every day for ten years or,
you can program every day for two years and take an algorithm class.

-- charles e. leiserson

we have both balance our mathematical understanding with our engineering common sense.

-- charles e. leiserson

it's not a common algorithm class, at least for me! i did not even learn about asymptotic notation when i was getting my algorithm and programming class.

Saturday, April 26, 2008

octave: speed optimization - fix matrix looping (080426)

with or without following commands on following octave codes will effect obviously different execution time. from with/without preallocated memory, changing looping sequence, vectorizing, until using specific function will produced different execution time. i have eight cases to be compared, they are:
  • case 1: without preallocated memory
for i = 1:N
m1(i, 1:P) = i * ones(1, P);
endfor
elapsed time is 39.3089 seconds.
  • case 2: preallocated memory with 'm = [];'
m2 = [];
for i = 1:N
m2(i, 1:P) = i * ones(1, P);
endfor
elapsed time is 38.7618 seconds.
  • case 3: preallocated memory with 'm(N, P) = 1;'
m3(N, P) = 1;
for i = 1:N
m3(i, 1:P) = i * ones(1, P);
endfor
elapsed time is 0.126078 second.
  • case 4: preallocated memory with 'm = ones(N, P);'
m4 = ones(N, P);
for i = 1:N
m4(i, 1:P) = i * ones (1, P);
endfor
elapsed time is 0.082219 second.
  • case 5: changing the looping sequence from 'for i = 1:N' to 'for i = N:-1:1'
thanks to ben abbott for his answer to my question related to this optimization. he also suggest me to compare with following code:
m5 = [];
for i = N:-1:1
m5(i, 1:P) = i * ones (1, P);
endfor
elapsed time is 0.130076 second.


thanks to przemek klosowski for following new cases (case 6 to 8) and idea to add the conclusion.
  • case 6: nested or more looping with preallocated memory
m6 = ones(N, P);
for i = N:-1:1
for j = P:-1:1
m6(i, j) = i;
endfor
endfor
elapsed time is 15.0053 seconds.
  • case 7: vectorizing using looping
m7 = ones(N, P);
for j = 1:P
m7(:,j) = 1:N;
endfor
elapsed time is 0.053149 second.
  • case 8: vectorizing using 'repmat' function
m8 = repmat((1:N)', 1, P);
elapsed time is 0.045707 second.
  • conclusion
  1. from case 1-4: one way to speed up looping execution is with preallocated memory.
  2. from case 5: another way to speed up looping execution is with changing the looping sequence.
  3. from case 6: looping is very time consuming. so, avoid of using looping with other commands. this can be different from case to case.
  4. from case 7: another way to speed up looping execution is with vectorizing. vectorizing can be faster than preallocated memory or changing the looping sequence.
  5. from case 8: another way to speed up looping execution is using specific function. using specific function can be faster than vectorizing, preallocated memory, or changing the looping sequence.
  • overall test code
# Copyright (C) 2008 by lain.ux
# l411v.ux@gmail.com
# www.l411v.com
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; version 2 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

# ------------------------------------------------[ define matrix size ]--
N = 900;
P = 1800;
printf("matrix size: m(%d,%d)\n", N, P);

# -------------------------------------------[ display result function ]--
function display_result(m, N, P)
printf("result:\n", N, P);

# checking first column (expected: 1, 1, 1, ..., 1)
printf("m[1,1] = %d\t", m(1,1));
printf("m[1,2] = %d\t", m(1,2));
printf("m[1,3] = %d\t", m(1,3));
printf("m[1,%d] = %d\n", P, m(1,P));

# checking second column (expected: 2, 2, 2, ..., 2)
printf("m[2,1] = %d\t", m(2,1));
printf("m[2,2] = %d\t", m(2,2));
printf("m[2,3] = %d\t", m(2,3));
printf("m[2,%d] = %d\n", P, m(2,P));

# checking Nth column (expected: 900, 900, 900, ..., 900)
printf("m[%d,1] = %d\t", N, m(N,1));
printf("m[%d,2] = %d\t", N, m(N,2));
printf("m[%d,3] = %d\t", N, m(N,3));
printf("m[%d,%d] = %d\n", N, P, m(N,P));
end

# each column of matrix is the sequence 1, 2, ..., N
# ------------------------------------------------------------[ case 1 ]--
printf("\ncase 1: without preallocated memory\n");
tic
for i = 1:N
m1(i, 1:P) = i * ones(1, P);
endfor
toc
display_result(m1, N, P);

# ------------------------------------------------------------[ case 2 ]--
printf("\ncase 2: preallocated memory with 'm = [];'\n");
tic
m2 = [];
for i = 1:N
m2(i, 1:P) = i * ones(1, P);
endfor
toc
display_result(m2, N, P);

# ------------------------------------------------------------[ case 3 ]--
printf("\ncase 3: preallocated memory with 'm(N, P) = 1;'\n");
tic
m3(N, P) = 1;
for i = 1:N
m3(i, 1:P) = i * ones(1, P);
endfor
toc
display_result(m3, N, P);

# ------------------------------------------------------------[ case 4 ]--
printf("\ncase 4: preallocated memory with 'm = ones(N, P);'\n");
tic
m4 = ones(N, P);
for i = 1:N
m4(i, 1:P) = i * ones (1, P);
endfor
toc
display_result(m4, N, P);

# ------------------------------------------------------------[ case 5 ]--
printf("\ncase 5: changing the looping sequence ");
printf("from 'for i = 1:N' to 'for i = N:-1:1'\n");
tic
m5 = [];
for i = N:-1:1
m5(i, 1:P) = i * ones (1, P);
endfor
toc
display_result(m5, N, P);

# ------------------------------------------------------------[ case 6 ]--
printf("\ncase 6: nested or more looping with preallocated memory\n");
tic
m6 = ones(N, P);
for i = N:-1:1
for j = P:-1:1
m6(i, j) = i;
endfor
endfor
toc
display_result(m6, N, P);

# ------------------------------------------------------------[ case 7 ]--
printf("\ncase 7: vectorizing using looping\n");
tic
m7 = ones(N, P);
for j = 1:P
m7(:,j) = 1:N;
endfor
toc
display_result(m7, N, P);

# ------------------------------------------------------------[ case 8 ]--
printf("\ncase 8: vectorizing using 'repmat' function\n");
tic
m8 = repmat((1:N)', 1, P);
toc
display_result(m8, N, P);
  • execution result
$ octave-3.0.1 -q trick_080426.octave
matrix size: m(900,1800)

case 1: without preallocated memory
Elapsed time is 39.3089 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 2: preallocated memory with 'm = [];'
Elapsed time is 38.7618 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 3: preallocated memory with 'm(N, P) = 1;'
Elapsed time is 0.126078 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 4: preallocated memory with 'm = ones(N, P);'
Elapsed time is 0.082219 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 5: changing the looping sequence from 'for i = 1:N' to 'for i = N:-1:1'
Elapsed time is 0.130076 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 6: nested or more looping with preallocated memory
Elapsed time is 15.0053 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 7: vectorizing using looping
Elapsed time is 0.053149 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

case 8: vectorizing using 'repmat' function
Elapsed time is 0.045707 seconds.
result:
m[1,1] = 1 m[1,2] = 1 m[1,3] = 1 m[1,1800] = 1
m[2,1] = 2 m[2,2] = 2 m[2,3] = 2 m[2,1800] = 2
m[900,1] = 900 m[900,2] = 900 m[900,3] = 900 m[900,1800] = 900

Saturday, March 29, 2008

syntax highlighting on blog

since syntax highlighter doesn't support vhdl and verilog code yet, so i manually convert the highlighted text file into html file. following the steps:
  • open your code using kwrite and set the highlighting color as your flavor by selecting "Settings > Configure Editor..." and clicking on "Fonts & Colors"
  • export your code into html file by clicking on "File > Export as HTML..." and "Save"
  • copy the html source of your html file by opening your html file with your internet browser and clicking on "View > Page Source" (if you are using mozilla firefox). copy from tag <pre> to </pre>.

  • paste it into your new posting blog in side "Edit Html" tab box and click back on "Compose" tab box to see the result.
  • add border to make it beautiful by changing html tag <pre> with following settings.
<pre style="border: 1px dashed rgb(64, 64, 64); margin: 0em; padding: 1em; overflow: auto; background-color: rgb(0, 0, 0);">
  • done! following examples of highlighted vhdl and verilog code
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity Const_Unit is
Port (Imm : in std_logic_vector(15 downto 0);
CS : in std_logic;
Const : out std_logic_vector(31 downto 0));
end Const_Unit;

architecture Behavioral of Const_Unit is
begin
-- pass the 16 lsb value.
Const(15 downto 0) <= Imm(15 downto 0);

process (CS, Imm) begin
if (CS = '0') then
-- unsign value
Const(31 downto 16) <= X"0000";
else
-- sign value (sign extension)
for i in 31 downto 16 loop
Const(i) <= Imm(15);
end loop;
end if;
end process;
end Behavioral;
code: highlighted vhdl code

`timescale 1ns / 1ns

`include "../inc/ctr.h"

module ctr(
// input
clk, rst,
// output
out
);
// i/os
input clk, // clock
rst; // reset
output [`WIDTH-1:0] out; // output
// internal signals
wire clk, // clock
rst; // reset
reg [`WIDTH-1:0] out; // output

always @(posedge clk or posedge rst) begin
if (rst)
out <= 0;
else
out <= out + 1;
end

endmodule
code: highlighted verilog code