Hi!
I was working on a project not so long ago, which did a huge amount of calculations. I was testing it with a small console app, when I bumped into an extremely weird performance issue.
The test app is here:
#include "cnn.h"
#include <stdio.h>
#include <time.h>
int main()
{
CNN cnn;
// --- Initialize ---
cnn.A.SetSize(3,3);
cnn.B.SetSize(3,3);
cnn.SetSize(640,480);
cnn.A(0,0) = 0; cnn.A(0,1) = 0; cnn.A(0,2) = 0;
cnn.A(1,0) = 1; cnn.A(1,1) = 4; cnn.A(1,2) = 2;
cnn.A(2,0) = 0; cnn.A(2,1) = 0; cnn.A(2,2) = 0;
cnn.B(0,0) = 1; cnn.B(0,1) = 4; cnn.B(0,2) = 7;
cnn.B(1,0) = 2; cnn.B(1,1) = 5; cnn.B(1,2) = 8;
cnn.B(2,0) = 3; cnn.B(2,1) = 6; cnn.B(2,2) = 9;
cnn.Check_A();
// --- Measure performance ---
int t = clock();
cnn.maxiter = 100;
cnn.Process();
printf("%ld\n",clock()-t);
return 0;
}
All it does, that it measures the performance of the cnn.Process() method.
This method makes a LOT of accesses into itself, namely accessing member variables, member structures, etc.
I was experimenting with different routines, but using the same interface, so the test console app wasn't need to be modified. I noticed, that a supposedly faster algorithm actually runs SLOWER. At first, I thought the new algorithm is just bad, BUT THAT WAS NOT THE CASE!
After a VERY long time, I was finally able to track things down, and I realized, that it is some sort of ALIGNMENT issue INSIDE THE TEST APP, and NOT in the Process() method!
If I changed the part before the initialization step above to the following:
#include "cnn.h"
#include <stdio.h>
#include <time.h>
CNN cnn;
int main()
{
// --- Initialize ---
then my new algorithm was faster, which I expected in the first place.
What I modified was this: I moved the declaration of the cnn OUTSIDE of the main() function, which solved the issue.
Now I would like to get some answers to this, as it is really frustrating me: how come, that a local variable inside main() makes object accesses so terribly slow. Because when I moved the declaration outside the main(), the program became (no joke) 20 TIMES faster! The problem also goes away, if I allocate the object with new, like this:
#include "cnn.h"
#include <stdio.h>
#include <time.h>
CNN cnn;
int main()
{
CNN* c = new CNN();
CNN& cnn = *c;
//...
The problem only arises, when I have the cnn as a local variable inside main (allocated on the stack).
The second weird thing is, that when I used a bit different CNN class (which had a few plus member variables) the problem also went away, which makes me think, that this is some sort of alignment issue, yet I cannot get it why.
I have uploaded a package of this weird thing, you can see for yourself:
http://digitus.itk.ppke.hu/~oroba/test.zipJust type make (or mingw32-make), and you get 2 executables:
test_fast.exe
test_slow.exe
Run them from the console, and you'll see what I mean. The only difference between the 2 programs is what I described above, see main_fast.cpp and main_slow.cpp.
I also included cnn2.h as the example for what happens, when the class is a bit modified. Rename cnn2.h to cnn.h, and you'll see, that the program gets faster in the "test_slow.exe case".
The difference between cnn.h and cnn2.h is PURELY the class SIZE (number of members), nothing else!
If you have the time, take this experiment. It may also affect YOUR projects, and I think may well surprise you, as it surprised me for sure. It may cause serious performance issues.
As I do not yet know why this happens, any ideas would be welcome.
--
Greets,
Balázs